You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Jian Fang <ji...@gmail.com> on 2013/10/22 23:44:25 UTC

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Hi,

I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
turned out that the terasort for MR2 is 2 times slower than that in MR1. I
cannot really believe it.

The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
configurations are as follows.

        mapreduce.map.java.opts = "-Xmx512m";
        mapreduce.reduce.java.opts = "-Xmx1536m";
        mapreduce.map.memory.mb = "768";
        mapreduce.reduce.memory.mb = "2048";

        yarn.scheduler.minimum-allocation-mb = "256";
        yarn.scheduler.maximum-allocation-mb = "8192";
        yarn.nodemanager.resource.memory-mb = "12288";
        yarn.nodemanager.resource.cpu-vcores = "16";

        mapreduce.reduce.shuffle.parallelcopies = "20";
        mapreduce.task.io.sort.factor = "48";
        mapreduce.task.io.sort.mb = "200";
        mapreduce.map.speculative = "true";
        mapreduce.reduce.speculative = "true";
        mapreduce.framework.name = "yarn";
        yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
        mapreduce.map.cpu.vcores = "1";
        mapreduce.reduce.cpu.vcores = "2";

        mapreduce.job.jvm.numtasks = "20";
        mapreduce.map.output.compress = "true";
        mapreduce.map.output.compress.codec =
"org.apache.hadoop.io.compress.SnappyCodec";

        yarn.resourcemanager.client.thread-count = "64";
        yarn.resourcemanager.scheduler.client.thread-count = "64";
        yarn.resourcemanager.resource-tracker.client.thread-count = "64";
        yarn.resourcemanager.scheduler.class =
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
        yarn.nodemanager.aux-services = "mapreduce_shuffle";
        yarn.nodemanager.aux-services.mapreduce.shuffle.class =
"org.apache.hadoop.mapred.ShuffleHandler";
        yarn.nodemanager.vmem-pmem-ratio = "5";
        yarn.nodemanager.container-executor.class =
"org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
        yarn.nodemanager.container-manager.thread-count = "64";
        yarn.nodemanager.localizer.client.thread-count = "20";
        yarn.nodemanager.localizer.fetch.thread-count = "20";

My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
around 90 minutes in MR2.

Does anyone know what is wrong here? Or do I need some special
configurations for terasort to work better in MR2?

Thanks in advance,

John


On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:

> Sam,
> I think your cluster is too small for any meaningful conclusions to be
> made.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>
> Hi Harsh,
>
> Thanks for your detailed response! Now, the efficiency of my Yarn cluster
> improved a lot after increasing the reducer number(mapreduce.job.reduces)
> in mapred-site.xml. But I still have some questions about the way of Yarn
> to execute MRv1 job:
>
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
> together, with a typical process(map > shuffle > reduce). In Yarn, as I
> know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a
> job?
>
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using '
> yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?
>
> Thanks as always!
>
> 2013/6/9 Harsh J <ha...@cloudera.com>
>
>> Hi Sam,
>>
>> > - How to know the container number? Why you say it will be 22
>> containers due to a 22 GB memory?
>>
>> The MR2's default configuration requests 1 GB resource each for Map
>> and Reduce containers. It requests 1.5 GB for the AM container that
>> runs the job, additionally. This is tunable using the properties
>> Sandy's mentioned in his post.
>>
>> > - My machine has 32 GB memory, how many memory is proper to be assigned
>> to containers?
>>
>> This is a general question. You may use the same process you took to
>> decide optimal number of slots in MR1 to decide this here. Every
>> container is a new JVM, and you're limited by the CPUs you have there
>> (if not the memory). Either increase memory requests from jobs, to
>> lower # of concurrent containers at a given time (runtime change), or
>> lower NM's published memory resources to control the same (config
>> change).
>>
>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>> framework? Like 'mapreduce.task.io.sort.mb' and
>> 'mapreduce.map.sort.spill.percent'
>>
>> Yes, all of these properties will still work. Old properties specific
>> to JobTracker or TaskTracker (usually found as a keyword in the config
>> name) will not apply anymore.
>>
>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>> > Hi Harsh,
>> >
>> > According to above suggestions, I removed the duplication of setting,
>> and
>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
>> > efficiency improved about 18%.  I have questions:
>> >
>> > - How to know the container number? Why you say it will be 22
>> containers due
>> > to a 22 GB memory?
>> > - My machine has 32 GB memory, how many memory is proper to be assigned
>> to
>> > containers?
>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>> 'yarn', will
>> > other parameters for mapred-site.xml still work in yarn framework? Like
>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>> >
>> > Thanks!
>> >
>> >
>> >
>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>> >>
>> >> Hey Sam,
>> >>
>> >> Did you get a chance to retry with Sandy's suggestions? The config
>> >> appears to be asking NMs to use roughly 22 total containers (as
>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>> >> resource. This could impact much, given the CPU is still the same for
>> >> both test runs.
>> >>
>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sa...@cloudera.com>
>> >> wrote:
>> >> > Hey Sam,
>> >> >
>> >> > Thanks for sharing your results.  I'm definitely curious about what's
>> >> > causing the difference.
>> >> >
>> >> > A couple observations:
>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in there
>> >> > twice
>> >> > with two different values.
>> >> >
>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the default
>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>> getting
>> >> > killed for running over resource limits?
>> >> >
>> >> > -Sandy
>> >> >
>> >> >
>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>> wrote:
>> >> >>
>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>> from
>> >> >> 33%
>> >> >> to 35% as below.
>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>> >> >>
>> >> >> Any way, below are my configurations for your reference. Thanks!
>> >> >> (A) core-site.xml
>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>> >> >>
>> >> >> (B) hdfs-site.xml
>> >> >>   <property>
>> >> >>     <name>dfs.replication</name>
>> >> >>     <value>1</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.name.dir</name>
>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.data.dir</name>
>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.block.size</name>
>> >> >>     <value>134217728</value><!-- 128MB -->
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.namenode.handler.count</name>
>> >> >>     <value>64</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.datanode.handler.count</name>
>> >> >>     <value>10</value>
>> >> >>   </property>
>> >> >>
>> >> >> (C) mapred-site.xml
>> >> >>   <property>
>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>> >> >>
>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>> >> >>     <description>No description</description>
>> >> >>     <final>true</final>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>mapreduce.cluster.local.dir</name>
>> >> >>
>> >> >>
>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>> >> >>     <description>No description</description>
>> >> >>     <final>true</final>
>> >> >>   </property>
>> >> >>
>> >> >> <property>
>> >> >>   <name>mapreduce.child.java.opts</name>
>> >> >>   <value>-Xmx1000m</value>
>> >> >> </property>
>> >> >>
>> >> >> <property>
>> >> >>     <name>mapreduce.framework.name</name>
>> >> >>     <value>yarn</value>
>> >> >>    </property>
>> >> >>
>> >> >>  <property>
>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>> >> >>     <value>8</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>> >> >>     <value>4</value>
>> >> >>   </property>
>> >> >>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>> >> >>     <value>true</value>
>> >> >>   </property>
>> >> >>
>> >> >> (D) yarn-site.xml
>> >> >>  <property>
>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>> >> >>     <value>node1:18025</value>
>> >> >>     <description>host is the hostname of the resource manager and
>> >> >>     port is the port on which the NodeManagers contact the Resource
>> >> >> Manager.
>> >> >>     </description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <description>The address of the RM web
>> application.</description>
>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>> >> >>     <value>node1:18088</value>
>> >> >>   </property>
>> >> >>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>> >> >>     <value>node1:18030</value>
>> >> >>     <description>host is the hostname of the resourcemanager and
>> port
>> >> >> is
>> >> >> the port
>> >> >>     on which the Applications in the cluster talk to the Resource
>> >> >> Manager.
>> >> >>     </description>
>> >> >>   </property>
>> >> >>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.resourcemanager.address</name>
>> >> >>     <value>node1:18040</value>
>> >> >>     <description>the host is the hostname of the ResourceManager and
>> >> >> the
>> >> >> port is the port on
>> >> >>     which the clients can talk to the Resource Manager.
>> </description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>> >> >>
>> >> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>> >> >>     <description>the local directories used by the
>> >> >> nodemanager</description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.address</name>
>> >> >>     <value>0.0.0.0:18050</value>
>> >> >>     <description>the nodemanagers bind to this port</description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>> >> >>     <value>10240</value>
>> >> >>     <description>the amount of memory on the NodeManager in
>> >> >> GB</description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>> >> >>
>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>> >> >>     <description>directory on hdfs where the application logs are
>> moved
>> >> >> to
>> >> >> </description>
>> >> >>   </property>
>> >> >>
>> >> >>    <property>
>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>> >> >>     <description>the directories used by Nodemanagers as log
>> >> >> directories</description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.aux-services</name>
>> >> >>     <value>mapreduce.shuffle</value>
>> >> >>     <description>shuffle service that needs to be set for Map
>> Reduce to
>> >> >> run </description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>> >> >>     <value>64</value>
>> >> >>   </property>
>> >> >>
>> >> >>  <property>
>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>> >> >>     <value>24</value>
>> >> >>   </property>
>> >> >>
>> >> >> <property>
>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>> >> >>     <value>3</value>
>> >> >>   </property>
>> >> >>
>> >> >>  <property>
>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>> >> >>     <value>22000</value>
>> >> >>   </property>
>> >> >>
>> >> >>  <property>
>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>> >> >>     <value>2.1</value>
>> >> >>   </property>
>> >> >>
>> >> >>
>> >> >>
>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>> >> >>>
>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>> resource
>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum by
>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB NM
>> >> >>> memory resource config) total containers. Do share your configs as
>> at
>> >> >>> this point none of us can tell what it is.
>> >> >>>
>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>> not
>> >> >>> care about such things :)
>> >> >>>
>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>> >> >>> wrote:
>> >> >>> > At the begining, I just want to do a fast comparision of MRv1 and
>> >> >>> > Yarn.
>> >> >>> > But
>> >> >>> > they have many differences, and to be fair for comparison I did
>> not
>> >> >>> > tune
>> >> >>> > their configurations at all.  So I got above test results. After
>> >> >>> > analyzing
>> >> >>> > the test result, no doubt, I will configure them and do
>> comparison
>> >> >>> > again.
>> >> >>> >
>> >> >>> > Do you have any idea on current test result? I think, to compare
>> >> >>> > with
>> >> >>> > MRv1,
>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>> >> >>> > phase(terasort test).
>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>> performance
>> >> >>> > tunning?
>> >> >>> >
>> >> >>> > Thanks!
>> >> >>> >
>> >> >>> >
>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <ma...@gmail.com>
>> >> >>> >>
>> >> >>> >> Why not to tune the configurations?
>> >> >>> >> Both frameworks have many areas to tune:
>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>> >> >>> >>>
>> >> >>> >>> Hi Experts,
>> >> >>> >>>
>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>> >> >>> >>> future,
>> >> >>> >>> and
>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>> >> >>> >>>
>> >> >>> >>> My env is three nodes cluster, and each node has similar
>> hardware:
>> >> >>> >>> 2
>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the
>> >> >>> >>> same
>> >> >>> >>> env. To
>> >> >>> >>> be fair, I did not make any performance tuning on their
>> >> >>> >>> configurations, but
>> >> >>> >>> use the default configuration values.
>> >> >>> >>>
>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1, if
>> >> >>> >>> they
>> >> >>> >>> all
>> >> >>> >>> use default configuration, because Yarn is a better framework
>> than
>> >> >>> >>> MRv1.
>> >> >>> >>> However, the test result shows some differences:
>> >> >>> >>>
>> >> >>> >>> MRv1: Hadoop-1.1.1
>> >> >>> >>> Yarn: Hadoop-2.0.4
>> >> >>> >>>
>> >> >>> >>> (A) Teragen: generate 10 GB data:
>> >> >>> >>> - MRv1: 193 sec
>> >> >>> >>> - Yarn: 69 sec
>> >> >>> >>> Yarn is 2.8 times better than MRv1
>> >> >>> >>>
>> >> >>> >>> (B) Terasort: sort 10 GB data:
>> >> >>> >>> - MRv1: 451 sec
>> >> >>> >>> - Yarn: 1136 sec
>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>> >> >>> >>>
>> >> >>> >>> After a fast analysis, I think the direct cause might be that
>> Yarn
>> >> >>> >>> is
>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>> >> >>> >>> phase.
>> >> >>> >>>
>> >> >>> >>> Here I have two questions:
>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>> >> >>> >>> materials?
>> >> >>> >>>
>> >> >>> >>> Thanks!
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> --
>> >> >>> >> Marcos Ortiz Valmaseda
>> >> >>> >> Product Manager at PDVSA
>> >> >>> >> http://about.me/marcosortiz
>> >> >>> >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Harsh J
>> >> >>
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Here is one run for MR1. Thanks.

2013-10-22 08:59:43,799 INFO org.apache.hadoop.mapred.JobClient (main):
Counters: 31
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Job Counters
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Launched reduce tasks=127
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
SLOTS_MILLIS_MAPS=173095295
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Total time spent by all reduces waiting after reserving slots (ms)=0
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Total time spent by all maps waiting after reserving slots (ms)=0
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Rack-local map tasks=13
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Launched map tasks=7630
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Data-local map tasks=7617
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
SLOTS_MILLIS_REDUCES=125929437
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
File Input Format Counters
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Bytes Read=1000478157952
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
File Output Format Counters
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
Bytes Written=1000000000000
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FileSystemCounters
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FILE_BYTES_READ=550120627113
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
HDFS_BYTES_READ=1000478910352
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FILE_BYTES_WRITTEN=725563889045
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
HDFS_BYTES_WRITTEN=1000000000000
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
Map-Reduce Framework
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map output materialized bytes=240948272514
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map input records=10000000000
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce shuffle bytes=240948272514
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Spilled Records=30000000000
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map output bytes=1000000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Total committed heap usage (bytes)=4841182068736
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
CPU time spent (ms)=118779050
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Map input bytes=1000000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
SPLIT_RAW_BYTES=752400
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Combine input records=0
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce input records=10000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce input groups=4294967296
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Combine output records=0
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Physical memory (bytes) snapshot=5092578041856
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce output records=10000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Virtual memory (bytes) snapshot=9391598915584
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Map output records=10000000000



On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> How many map and reduce slots are you using per tasktracker in MR1?  How
> do the average map times compare? (MR2 reports this directly on the web UI,
> but you can also get a sense in MR1 by scrolling through the map tasks
> page).  Can you share the counters for MR1?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>> to 2 times slowness. There should be something very wrong either in
>> configuration or code. Any hints?
>>
>>
>>
>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>
>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>
>>>
>>> 2013-10-20 03:13:58,751 ERROR [main]
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>> No lease on
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>> File does not exist. Holder
>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>> any open files.
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> --
>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>         at
>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.nio.channels.ClosedChannelException
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>         at
>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>         at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>         at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>> MR1 with JVM reuse turned off.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The Terasort output for MR2 is as follows.
>>>>>
>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=456102049355
>>>>>                 FILE: Number of bytes written=897246250517
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=32131
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=224
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=20
>>>>>                 Launched map tasks=7601
>>>>>                 Launched reduce tasks=132
>>>>>                 Data-local map tasks=7591
>>>>>                 Rack-local map tasks=10
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1696141311
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=2664045096
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440486356802
>>>>>                 Input split bytes=851200
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440486356802
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =851200
>>>>>                 Failed Shuffles=61
>>>>>                 Merged Map outputs=851200
>>>>>                 GC time elapsed (ms)=4215666
>>>>>                 CPU time spent (ms)=192433000
>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=4
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>> MR1. I cannot really believe it.
>>>>>>
>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>>> configurations are as follows.
>>>>>>
>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>
>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>
>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>         mapreduce.map.speculative = "true";
>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>
>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>         mapreduce.map.output.compress.codec =
>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>
>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>> "64";
>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>
>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>> it took around 90 minutes in MR2.
>>>>>>
>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>> configurations for terasort to work better in MR2?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>
>>>>>>> Sam,
>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>> be made.
>>>>>>>
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Harsh,
>>>>>>>
>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>> default size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>>
>>>>>>> Thanks as always!
>>>>>>>
>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>
>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to containers?
>>>>>>>>
>>>>>>>> This is a general question. You may use the same process you took to
>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>> there
>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>> or
>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>> change).
>>>>>>>>
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>
>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>> specific
>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>> config
>>>>>>>> name) will not apply anymore.
>>>>>>>>
>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Harsh,
>>>>>>>> >
>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>> setting, and
>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>> then, the
>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>> >
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due
>>>>>>>> > to a 22 GB memory?
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to
>>>>>>>> > containers?
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will
>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like
>>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>> >>
>>>>>>>> >> Hey Sam,
>>>>>>>> >>
>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>> config
>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>> same for
>>>>>>>> >> both test runs.
>>>>>>>> >>
>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > Hey Sam,
>>>>>>>> >> >
>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>>> what's
>>>>>>>> >> > causing the difference.
>>>>>>>> >> >
>>>>>>>> >> > A couple observations:
>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>> in there
>>>>>>>> >> > twice
>>>>>>>> >> > with two different values.
>>>>>>>> >> >
>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>> default
>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>> tasks getting
>>>>>>>> >> > killed for running over resource limits?
>>>>>>>> >> >
>>>>>>>> >> > -Sandy
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>> mins from
>>>>>>>> >> >> 33%
>>>>>>>> >> >> to 35% as below.
>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>> >> >>
>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>> Thanks!
>>>>>>>> >> >> (A) core-site.xml
>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>> >> >>
>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>> >> >>     <value>1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>> >> >>     <value>10</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>> >> >> </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>> >> >>    </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>> >> >>     <value>8</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>> >> >>     <value>4</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>> >> >>     <value>true</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>>> and
>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>> application.</description>
>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>> and port
>>>>>>>> >> >> is
>>>>>>>> >> >> the port
>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>> ResourceManager and
>>>>>>>> >> >> the
>>>>>>>> >> >> port is the port on
>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>> >> >> nodemanager</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>> port</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>10240</value>
>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>> >> >> GB</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>>> are moved
>>>>>>>> >> >> to
>>>>>>>> >> >> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>    <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>> >> >> directories</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>>> Reduce to
>>>>>>>> >> >> run </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>> >> >>     <value>24</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>> >> >>     <value>3</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>22000</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>> >> >>>
>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>> resource
>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>> minimum by
>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>>> GB NM
>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>> configs as at
>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>> and to not
>>>>>>>> >> >>> care about such things :)
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>> MRv1 and
>>>>>>>> >> >>> > Yarn.
>>>>>>>> >> >>> > But
>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>> I did not
>>>>>>>> >> >>> > tune
>>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>>> After
>>>>>>>> >> >>> > analyzing
>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>> comparison
>>>>>>>> >> >>> > again.
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>> compare
>>>>>>>> >> >>> > with
>>>>>>>> >> >>> > MRv1,
>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>> performance
>>>>>>>> >> >>> > tunning?
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Thanks!
>>>>>>>> >> >>> >
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>> near
>>>>>>>> >> >>> >>> future,
>>>>>>>> >> >>> >>> and
>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>>> hardware:
>>>>>>>> >> >>> >>> 2
>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>> on the
>>>>>>>> >> >>> >>> same
>>>>>>>> >> >>> >>> env. To
>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>> >> >>> >>> configurations, but
>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>> MRv1, if
>>>>>>>> >> >>> >>> they
>>>>>>>> >> >>> >>> all
>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>> framework than
>>>>>>>> >> >>> >>> MRv1.
>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>>> that Yarn
>>>>>>>> >> >>> >>> is
>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> >>> phase.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>> terasort?
>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>>> >> >>> >>> materials?
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Thanks!
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> --
>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Harsh J
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Harsh J
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Here is one run for MR1. Thanks.

2013-10-22 08:59:43,799 INFO org.apache.hadoop.mapred.JobClient (main):
Counters: 31
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Job Counters
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Launched reduce tasks=127
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
SLOTS_MILLIS_MAPS=173095295
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Total time spent by all reduces waiting after reserving slots (ms)=0
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Total time spent by all maps waiting after reserving slots (ms)=0
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Rack-local map tasks=13
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Launched map tasks=7630
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Data-local map tasks=7617
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
SLOTS_MILLIS_REDUCES=125929437
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
File Input Format Counters
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Bytes Read=1000478157952
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
File Output Format Counters
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
Bytes Written=1000000000000
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FileSystemCounters
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FILE_BYTES_READ=550120627113
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
HDFS_BYTES_READ=1000478910352
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FILE_BYTES_WRITTEN=725563889045
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
HDFS_BYTES_WRITTEN=1000000000000
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
Map-Reduce Framework
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map output materialized bytes=240948272514
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map input records=10000000000
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce shuffle bytes=240948272514
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Spilled Records=30000000000
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map output bytes=1000000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Total committed heap usage (bytes)=4841182068736
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
CPU time spent (ms)=118779050
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Map input bytes=1000000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
SPLIT_RAW_BYTES=752400
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Combine input records=0
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce input records=10000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce input groups=4294967296
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Combine output records=0
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Physical memory (bytes) snapshot=5092578041856
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce output records=10000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Virtual memory (bytes) snapshot=9391598915584
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Map output records=10000000000



On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> How many map and reduce slots are you using per tasktracker in MR1?  How
> do the average map times compare? (MR2 reports this directly on the web UI,
> but you can also get a sense in MR1 by scrolling through the map tasks
> page).  Can you share the counters for MR1?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>> to 2 times slowness. There should be something very wrong either in
>> configuration or code. Any hints?
>>
>>
>>
>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>
>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>
>>>
>>> 2013-10-20 03:13:58,751 ERROR [main]
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>> No lease on
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>> File does not exist. Holder
>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>> any open files.
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> --
>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>         at
>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.nio.channels.ClosedChannelException
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>         at
>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>         at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>         at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>> MR1 with JVM reuse turned off.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The Terasort output for MR2 is as follows.
>>>>>
>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=456102049355
>>>>>                 FILE: Number of bytes written=897246250517
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=32131
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=224
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=20
>>>>>                 Launched map tasks=7601
>>>>>                 Launched reduce tasks=132
>>>>>                 Data-local map tasks=7591
>>>>>                 Rack-local map tasks=10
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1696141311
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=2664045096
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440486356802
>>>>>                 Input split bytes=851200
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440486356802
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =851200
>>>>>                 Failed Shuffles=61
>>>>>                 Merged Map outputs=851200
>>>>>                 GC time elapsed (ms)=4215666
>>>>>                 CPU time spent (ms)=192433000
>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=4
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>> MR1. I cannot really believe it.
>>>>>>
>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>>> configurations are as follows.
>>>>>>
>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>
>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>
>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>         mapreduce.map.speculative = "true";
>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>
>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>         mapreduce.map.output.compress.codec =
>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>
>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>> "64";
>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>
>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>> it took around 90 minutes in MR2.
>>>>>>
>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>> configurations for terasort to work better in MR2?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>
>>>>>>> Sam,
>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>> be made.
>>>>>>>
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Harsh,
>>>>>>>
>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>> default size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>>
>>>>>>> Thanks as always!
>>>>>>>
>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>
>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to containers?
>>>>>>>>
>>>>>>>> This is a general question. You may use the same process you took to
>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>> there
>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>> or
>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>> change).
>>>>>>>>
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>
>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>> specific
>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>> config
>>>>>>>> name) will not apply anymore.
>>>>>>>>
>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Harsh,
>>>>>>>> >
>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>> setting, and
>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>> then, the
>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>> >
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due
>>>>>>>> > to a 22 GB memory?
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to
>>>>>>>> > containers?
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will
>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like
>>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>> >>
>>>>>>>> >> Hey Sam,
>>>>>>>> >>
>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>> config
>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>> same for
>>>>>>>> >> both test runs.
>>>>>>>> >>
>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > Hey Sam,
>>>>>>>> >> >
>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>>> what's
>>>>>>>> >> > causing the difference.
>>>>>>>> >> >
>>>>>>>> >> > A couple observations:
>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>> in there
>>>>>>>> >> > twice
>>>>>>>> >> > with two different values.
>>>>>>>> >> >
>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>> default
>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>> tasks getting
>>>>>>>> >> > killed for running over resource limits?
>>>>>>>> >> >
>>>>>>>> >> > -Sandy
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>> mins from
>>>>>>>> >> >> 33%
>>>>>>>> >> >> to 35% as below.
>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>> >> >>
>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>> Thanks!
>>>>>>>> >> >> (A) core-site.xml
>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>> >> >>
>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>> >> >>     <value>1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>> >> >>     <value>10</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>> >> >> </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>> >> >>    </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>> >> >>     <value>8</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>> >> >>     <value>4</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>> >> >>     <value>true</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>>> and
>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>> application.</description>
>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>> and port
>>>>>>>> >> >> is
>>>>>>>> >> >> the port
>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>> ResourceManager and
>>>>>>>> >> >> the
>>>>>>>> >> >> port is the port on
>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>> >> >> nodemanager</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>> port</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>10240</value>
>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>> >> >> GB</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>>> are moved
>>>>>>>> >> >> to
>>>>>>>> >> >> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>    <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>> >> >> directories</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>>> Reduce to
>>>>>>>> >> >> run </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>> >> >>     <value>24</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>> >> >>     <value>3</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>22000</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>> >> >>>
>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>> resource
>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>> minimum by
>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>>> GB NM
>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>> configs as at
>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>> and to not
>>>>>>>> >> >>> care about such things :)
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>> MRv1 and
>>>>>>>> >> >>> > Yarn.
>>>>>>>> >> >>> > But
>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>> I did not
>>>>>>>> >> >>> > tune
>>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>>> After
>>>>>>>> >> >>> > analyzing
>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>> comparison
>>>>>>>> >> >>> > again.
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>> compare
>>>>>>>> >> >>> > with
>>>>>>>> >> >>> > MRv1,
>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>> performance
>>>>>>>> >> >>> > tunning?
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Thanks!
>>>>>>>> >> >>> >
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>> near
>>>>>>>> >> >>> >>> future,
>>>>>>>> >> >>> >>> and
>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>>> hardware:
>>>>>>>> >> >>> >>> 2
>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>> on the
>>>>>>>> >> >>> >>> same
>>>>>>>> >> >>> >>> env. To
>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>> >> >>> >>> configurations, but
>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>> MRv1, if
>>>>>>>> >> >>> >>> they
>>>>>>>> >> >>> >>> all
>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>> framework than
>>>>>>>> >> >>> >>> MRv1.
>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>>> that Yarn
>>>>>>>> >> >>> >>> is
>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> >>> phase.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>> terasort?
>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>>> >> >>> >>> materials?
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Thanks!
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> --
>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Harsh J
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Harsh J
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Unfortunately yes.  I agree that it should be documented.  The only way to
compare fairly is to use the same terasort jar against both.  When I've
done comparisons, I've used the MR2 examples jar against both MR1 and MR2.
 This works on CDH because it has a version of MR1 that's compatible with
MR2, but may require some recompilation if using the apache releases.


On Wed, Oct 23, 2013 at 5:01 PM, Jian Fang <ji...@gmail.com>wrote:

> Really? Does that mean I cannot compare terasort in MR2 with MR1
> directly?  If yes, this really should be documented.
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 4:54 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> I should have brought this up earlier - the Terasort benchmark
>> requirements changed recently to make data less compressible.  The MR2
>> version of Terasort has this change, but the MR1 version does not.    So
>> Snappy should be working fine, but the data that MR2 is using is less
>> compressible.
>>
>>
>> On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Ok, I think I know where the problem is now. I used snappy to compress
>>> map output.
>>>
>>> Calculate the compression ratio = "Map output materialized bytes"/"Map
>>> output bytes"
>>>
>>> MR2
>>> 440519309909/1020000000000 = 0.431881676
>>>
>>> MR1
>>> 240948272514/1000000000000 = 0.240948273
>>>
>>>
>>> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
>>> MR2 needs to fetch much more data during shuffle phase. Any way to check
>>> snappy in MR2?
>>>
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>>>> 4,294,967,296, but 10,000,000,000 for MR2.
>>>> That is to say the number of unique keys fed into the reducers is
>>>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Reducing the number of map containers to 8 slightly improve the total
>>>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>>>> reason why the containers are killed other than the message such as
>>>>> "Container killed by the
>>>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>>>> killed on request. Exit code is 143"
>>>>>
>>>>>
>>>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=455484809066
>>>>>                 FILE: Number of bytes written=896642344343
>>>>>
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000841624
>>>>>
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=25531
>>>>>
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=150
>>>>>
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=11
>>>>>                 Launched map tasks=7449
>>>>>                 Launched reduce tasks=86
>>>>>                 Data-local map tasks=7434
>>>>>                 Rack-local map tasks=15
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1030941232
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=1574732272
>>>>>
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440519309909
>>>>>                 Input split bytes=841624
>>>>>
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440519309909
>>>>>
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =558600
>>>>>                 Failed Shuffles=0
>>>>>                 Merged Map outputs=558600
>>>>>                 GC time elapsed (ms)=2855534
>>>>>                 CPU time spent (ms)=170381420
>>>>>                 Physical memory (bytes) snapshot=3704938954752
>>>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>>>                 Total committed heap usage (bytes)=4523248582656
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=0
>>>>>
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Already started the tests with only 8 containers for map in MR2,
>>>>>> still running.
>>>>>>
>>>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Changing mapreduce.job.reduce.
>>>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>>>> let map phase run faster?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sandy.
>>>>>>>>>
>>>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>>>
>>>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>>>> back to 768MB.
>>>>>>>>>
>>>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>>>> happens. BTW, I should use
>>>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>>>
>>>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could
>>>>>>>>> not find the counterpart for MR2. Which one I should tune for the http
>>>>>>>>> thread?
>>>>>>>>>
>>>>>>>>> Thanks again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are
>>>>>>>>>> taking about three times as long in MR2 as they are in MR1.  This is
>>>>>>>>>> probably because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>>>> phases will run separately.
>>>>>>>>>>
>>>>>>>>>> On the other side, it looks like your MR1 has more spilled
>>>>>>>>>> records than MR2.  For a fairer comparison, you should set
>>>>>>>>>> io.sort.record.percent in MR1 to .13, which should improve MR1 performance,
>>>>>>>>>> but will provide a fairer comparison (MR2 automatically does this tuning
>>>>>>>>>> for you).
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The number of map slots and reduce slots on each data node for
>>>>>>>>>>> MR1 are 8 and 3, respectively. Since MR2 could use all containers for
>>>>>>>>>>> either map or reduce, I would expect that MR2 is faster.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>>>
>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same
>>>>>>>>>>>>> result, i.e., about 90 minutes for MR2. I don't think the killed reduces
>>>>>>>>>>>>> could contribute to 2 times slowness. There should be something very wrong
>>>>>>>>>>>>> either in configuration or code. Any hints?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>>>> No lease on
>>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>>>> any open files.
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO
>>>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.Job (main): Counters: 46
>>>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>>>                 Physical memory (bytes)
>>>>>>>>>>>>>>>> snapshot=3349356380160
>>>>>>>>>>>>>>>>                 Virtual memory (bytes)
>>>>>>>>>>>>>>>> snapshot=9665208745984
>>>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with
>>>>>>>>>>>>>>>>> Hadoop 1.0.3 and it turned out that the terasort for MR2 is 2 times slower
>>>>>>>>>>>>>>>>> than that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count
>>>>>>>>>>>>>>>>> = "64";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services =
>>>>>>>>>>>>>>>>> "mapreduce_shuffle";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count =
>>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...),
>>>>>>>>>>>>>>>>>> but, MRv1 job has special execution process(map > shuffle > reduce) in
>>>>>>>>>>>>>>>>>> Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR
>>>>>>>>>>>>>>>>>> steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of
>>>>>>>>>>>>>>>>>> Yarn to execute a job?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it
>>>>>>>>>>>>>>>>>>> will be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>>> proper to be assigned to containers?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this
>>>>>>>>>>>>>>>>>>> here. Every
>>>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs
>>>>>>>>>>>>>>>>>>> you have there
>>>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests
>>>>>>>>>>>>>>>>>>> from jobs, to
>>>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time
>>>>>>>>>>>>>>>>>>> (runtime change), or
>>>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the
>>>>>>>>>>>>>>>>>>> same (config
>>>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will other
>>>>>>>>>>>>>>>>>>> parameters for mapred-site.xml still work in yarn framework? Like
>>>>>>>>>>>>>>>>>>> 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it
>>>>>>>>>>>>>>>>>>> will be 22 containers due
>>>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>>> proper to be assigned to
>>>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will
>>>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in
>>>>>>>>>>>>>>>>>>> yarn framework? Like
>>>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's
>>>>>>>>>>>>>>>>>>> suggestions? The config
>>>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22
>>>>>>>>>>>>>>>>>>> GB memory
>>>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
>>>>>>>>>>>>>>>>>>> of your tasks getting
>>>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk
>>>>>>>>>>>>>>>>>>> to the Resource
>>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be
>>>>>>>>>>>>>>>>>>> requesting 1 GB minimum by
>>>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at
>>>>>>>>>>>>>>>>>>> 8 (due to 8 GB NM
>>>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do
>>>>>>>>>>>>>>>>>>> share your configs as at
>>>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
>>>>>>>>>>>>>>>>>>> for users and to not
>>>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above
>>>>>>>>>>>>>>>>>>> test results. After
>>>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure
>>>>>>>>>>>>>>>>>>> them and do comparison
>>>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1
>>>>>>>>>>>>>>>>>>> cluster are set on the
>>>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance
>>>>>>>>>>>>>>>>>>> tuning on their
>>>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some
>>>>>>>>>>>>>>>>>>> differences:
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct
>>>>>>>>>>>>>>>>>>> cause might be that Yarn
>>>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Unfortunately yes.  I agree that it should be documented.  The only way to
compare fairly is to use the same terasort jar against both.  When I've
done comparisons, I've used the MR2 examples jar against both MR1 and MR2.
 This works on CDH because it has a version of MR1 that's compatible with
MR2, but may require some recompilation if using the apache releases.


On Wed, Oct 23, 2013 at 5:01 PM, Jian Fang <ji...@gmail.com>wrote:

> Really? Does that mean I cannot compare terasort in MR2 with MR1
> directly?  If yes, this really should be documented.
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 4:54 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> I should have brought this up earlier - the Terasort benchmark
>> requirements changed recently to make data less compressible.  The MR2
>> version of Terasort has this change, but the MR1 version does not.    So
>> Snappy should be working fine, but the data that MR2 is using is less
>> compressible.
>>
>>
>> On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Ok, I think I know where the problem is now. I used snappy to compress
>>> map output.
>>>
>>> Calculate the compression ratio = "Map output materialized bytes"/"Map
>>> output bytes"
>>>
>>> MR2
>>> 440519309909/1020000000000 = 0.431881676
>>>
>>> MR1
>>> 240948272514/1000000000000 = 0.240948273
>>>
>>>
>>> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
>>> MR2 needs to fetch much more data during shuffle phase. Any way to check
>>> snappy in MR2?
>>>
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>>>> 4,294,967,296, but 10,000,000,000 for MR2.
>>>> That is to say the number of unique keys fed into the reducers is
>>>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Reducing the number of map containers to 8 slightly improve the total
>>>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>>>> reason why the containers are killed other than the message such as
>>>>> "Container killed by the
>>>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>>>> killed on request. Exit code is 143"
>>>>>
>>>>>
>>>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=455484809066
>>>>>                 FILE: Number of bytes written=896642344343
>>>>>
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000841624
>>>>>
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=25531
>>>>>
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=150
>>>>>
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=11
>>>>>                 Launched map tasks=7449
>>>>>                 Launched reduce tasks=86
>>>>>                 Data-local map tasks=7434
>>>>>                 Rack-local map tasks=15
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1030941232
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=1574732272
>>>>>
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440519309909
>>>>>                 Input split bytes=841624
>>>>>
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440519309909
>>>>>
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =558600
>>>>>                 Failed Shuffles=0
>>>>>                 Merged Map outputs=558600
>>>>>                 GC time elapsed (ms)=2855534
>>>>>                 CPU time spent (ms)=170381420
>>>>>                 Physical memory (bytes) snapshot=3704938954752
>>>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>>>                 Total committed heap usage (bytes)=4523248582656
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=0
>>>>>
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Already started the tests with only 8 containers for map in MR2,
>>>>>> still running.
>>>>>>
>>>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Changing mapreduce.job.reduce.
>>>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>>>> let map phase run faster?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sandy.
>>>>>>>>>
>>>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>>>
>>>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>>>> back to 768MB.
>>>>>>>>>
>>>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>>>> happens. BTW, I should use
>>>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>>>
>>>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could
>>>>>>>>> not find the counterpart for MR2. Which one I should tune for the http
>>>>>>>>> thread?
>>>>>>>>>
>>>>>>>>> Thanks again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are
>>>>>>>>>> taking about three times as long in MR2 as they are in MR1.  This is
>>>>>>>>>> probably because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>>>> phases will run separately.
>>>>>>>>>>
>>>>>>>>>> On the other side, it looks like your MR1 has more spilled
>>>>>>>>>> records than MR2.  For a fairer comparison, you should set
>>>>>>>>>> io.sort.record.percent in MR1 to .13, which should improve MR1 performance,
>>>>>>>>>> but will provide a fairer comparison (MR2 automatically does this tuning
>>>>>>>>>> for you).
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The number of map slots and reduce slots on each data node for
>>>>>>>>>>> MR1 are 8 and 3, respectively. Since MR2 could use all containers for
>>>>>>>>>>> either map or reduce, I would expect that MR2 is faster.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>>>
>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same
>>>>>>>>>>>>> result, i.e., about 90 minutes for MR2. I don't think the killed reduces
>>>>>>>>>>>>> could contribute to 2 times slowness. There should be something very wrong
>>>>>>>>>>>>> either in configuration or code. Any hints?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>>>> No lease on
>>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>>>> any open files.
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO
>>>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.Job (main): Counters: 46
>>>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>>>                 Physical memory (bytes)
>>>>>>>>>>>>>>>> snapshot=3349356380160
>>>>>>>>>>>>>>>>                 Virtual memory (bytes)
>>>>>>>>>>>>>>>> snapshot=9665208745984
>>>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with
>>>>>>>>>>>>>>>>> Hadoop 1.0.3 and it turned out that the terasort for MR2 is 2 times slower
>>>>>>>>>>>>>>>>> than that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count
>>>>>>>>>>>>>>>>> = "64";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services =
>>>>>>>>>>>>>>>>> "mapreduce_shuffle";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count =
>>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...),
>>>>>>>>>>>>>>>>>> but, MRv1 job has special execution process(map > shuffle > reduce) in
>>>>>>>>>>>>>>>>>> Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR
>>>>>>>>>>>>>>>>>> steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of
>>>>>>>>>>>>>>>>>> Yarn to execute a job?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it
>>>>>>>>>>>>>>>>>>> will be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>>> proper to be assigned to containers?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this
>>>>>>>>>>>>>>>>>>> here. Every
>>>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs
>>>>>>>>>>>>>>>>>>> you have there
>>>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests
>>>>>>>>>>>>>>>>>>> from jobs, to
>>>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time
>>>>>>>>>>>>>>>>>>> (runtime change), or
>>>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the
>>>>>>>>>>>>>>>>>>> same (config
>>>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will other
>>>>>>>>>>>>>>>>>>> parameters for mapred-site.xml still work in yarn framework? Like
>>>>>>>>>>>>>>>>>>> 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it
>>>>>>>>>>>>>>>>>>> will be 22 containers due
>>>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>>> proper to be assigned to
>>>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will
>>>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in
>>>>>>>>>>>>>>>>>>> yarn framework? Like
>>>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's
>>>>>>>>>>>>>>>>>>> suggestions? The config
>>>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22
>>>>>>>>>>>>>>>>>>> GB memory
>>>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
>>>>>>>>>>>>>>>>>>> of your tasks getting
>>>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk
>>>>>>>>>>>>>>>>>>> to the Resource
>>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be
>>>>>>>>>>>>>>>>>>> requesting 1 GB minimum by
>>>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at
>>>>>>>>>>>>>>>>>>> 8 (due to 8 GB NM
>>>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do
>>>>>>>>>>>>>>>>>>> share your configs as at
>>>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
>>>>>>>>>>>>>>>>>>> for users and to not
>>>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above
>>>>>>>>>>>>>>>>>>> test results. After
>>>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure
>>>>>>>>>>>>>>>>>>> them and do comparison
>>>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1
>>>>>>>>>>>>>>>>>>> cluster are set on the
>>>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance
>>>>>>>>>>>>>>>>>>> tuning on their
>>>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some
>>>>>>>>>>>>>>>>>>> differences:
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct
>>>>>>>>>>>>>>>>>>> cause might be that Yarn
>>>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Unfortunately yes.  I agree that it should be documented.  The only way to
compare fairly is to use the same terasort jar against both.  When I've
done comparisons, I've used the MR2 examples jar against both MR1 and MR2.
 This works on CDH because it has a version of MR1 that's compatible with
MR2, but may require some recompilation if using the apache releases.


On Wed, Oct 23, 2013 at 5:01 PM, Jian Fang <ji...@gmail.com>wrote:

> Really? Does that mean I cannot compare terasort in MR2 with MR1
> directly?  If yes, this really should be documented.
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 4:54 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> I should have brought this up earlier - the Terasort benchmark
>> requirements changed recently to make data less compressible.  The MR2
>> version of Terasort has this change, but the MR1 version does not.    So
>> Snappy should be working fine, but the data that MR2 is using is less
>> compressible.
>>
>>
>> On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Ok, I think I know where the problem is now. I used snappy to compress
>>> map output.
>>>
>>> Calculate the compression ratio = "Map output materialized bytes"/"Map
>>> output bytes"
>>>
>>> MR2
>>> 440519309909/1020000000000 = 0.431881676
>>>
>>> MR1
>>> 240948272514/1000000000000 = 0.240948273
>>>
>>>
>>> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
>>> MR2 needs to fetch much more data during shuffle phase. Any way to check
>>> snappy in MR2?
>>>
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>>>> 4,294,967,296, but 10,000,000,000 for MR2.
>>>> That is to say the number of unique keys fed into the reducers is
>>>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Reducing the number of map containers to 8 slightly improve the total
>>>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>>>> reason why the containers are killed other than the message such as
>>>>> "Container killed by the
>>>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>>>> killed on request. Exit code is 143"
>>>>>
>>>>>
>>>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=455484809066
>>>>>                 FILE: Number of bytes written=896642344343
>>>>>
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000841624
>>>>>
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=25531
>>>>>
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=150
>>>>>
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=11
>>>>>                 Launched map tasks=7449
>>>>>                 Launched reduce tasks=86
>>>>>                 Data-local map tasks=7434
>>>>>                 Rack-local map tasks=15
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1030941232
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=1574732272
>>>>>
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440519309909
>>>>>                 Input split bytes=841624
>>>>>
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440519309909
>>>>>
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =558600
>>>>>                 Failed Shuffles=0
>>>>>                 Merged Map outputs=558600
>>>>>                 GC time elapsed (ms)=2855534
>>>>>                 CPU time spent (ms)=170381420
>>>>>                 Physical memory (bytes) snapshot=3704938954752
>>>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>>>                 Total committed heap usage (bytes)=4523248582656
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=0
>>>>>
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Already started the tests with only 8 containers for map in MR2,
>>>>>> still running.
>>>>>>
>>>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Changing mapreduce.job.reduce.
>>>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>>>> let map phase run faster?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sandy.
>>>>>>>>>
>>>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>>>
>>>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>>>> back to 768MB.
>>>>>>>>>
>>>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>>>> happens. BTW, I should use
>>>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>>>
>>>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could
>>>>>>>>> not find the counterpart for MR2. Which one I should tune for the http
>>>>>>>>> thread?
>>>>>>>>>
>>>>>>>>> Thanks again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are
>>>>>>>>>> taking about three times as long in MR2 as they are in MR1.  This is
>>>>>>>>>> probably because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>>>> phases will run separately.
>>>>>>>>>>
>>>>>>>>>> On the other side, it looks like your MR1 has more spilled
>>>>>>>>>> records than MR2.  For a fairer comparison, you should set
>>>>>>>>>> io.sort.record.percent in MR1 to .13, which should improve MR1 performance,
>>>>>>>>>> but will provide a fairer comparison (MR2 automatically does this tuning
>>>>>>>>>> for you).
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The number of map slots and reduce slots on each data node for
>>>>>>>>>>> MR1 are 8 and 3, respectively. Since MR2 could use all containers for
>>>>>>>>>>> either map or reduce, I would expect that MR2 is faster.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>>>
>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same
>>>>>>>>>>>>> result, i.e., about 90 minutes for MR2. I don't think the killed reduces
>>>>>>>>>>>>> could contribute to 2 times slowness. There should be something very wrong
>>>>>>>>>>>>> either in configuration or code. Any hints?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>>>> No lease on
>>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>>>> any open files.
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO
>>>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.Job (main): Counters: 46
>>>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>>>                 Physical memory (bytes)
>>>>>>>>>>>>>>>> snapshot=3349356380160
>>>>>>>>>>>>>>>>                 Virtual memory (bytes)
>>>>>>>>>>>>>>>> snapshot=9665208745984
>>>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with
>>>>>>>>>>>>>>>>> Hadoop 1.0.3 and it turned out that the terasort for MR2 is 2 times slower
>>>>>>>>>>>>>>>>> than that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count
>>>>>>>>>>>>>>>>> = "64";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services =
>>>>>>>>>>>>>>>>> "mapreduce_shuffle";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count =
>>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...),
>>>>>>>>>>>>>>>>>> but, MRv1 job has special execution process(map > shuffle > reduce) in
>>>>>>>>>>>>>>>>>> Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR
>>>>>>>>>>>>>>>>>> steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of
>>>>>>>>>>>>>>>>>> Yarn to execute a job?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it
>>>>>>>>>>>>>>>>>>> will be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>>> proper to be assigned to containers?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this
>>>>>>>>>>>>>>>>>>> here. Every
>>>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs
>>>>>>>>>>>>>>>>>>> you have there
>>>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests
>>>>>>>>>>>>>>>>>>> from jobs, to
>>>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time
>>>>>>>>>>>>>>>>>>> (runtime change), or
>>>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the
>>>>>>>>>>>>>>>>>>> same (config
>>>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will other
>>>>>>>>>>>>>>>>>>> parameters for mapred-site.xml still work in yarn framework? Like
>>>>>>>>>>>>>>>>>>> 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it
>>>>>>>>>>>>>>>>>>> will be 22 containers due
>>>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>>> proper to be assigned to
>>>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will
>>>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in
>>>>>>>>>>>>>>>>>>> yarn framework? Like
>>>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's
>>>>>>>>>>>>>>>>>>> suggestions? The config
>>>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22
>>>>>>>>>>>>>>>>>>> GB memory
>>>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
>>>>>>>>>>>>>>>>>>> of your tasks getting
>>>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk
>>>>>>>>>>>>>>>>>>> to the Resource
>>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be
>>>>>>>>>>>>>>>>>>> requesting 1 GB minimum by
>>>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at
>>>>>>>>>>>>>>>>>>> 8 (due to 8 GB NM
>>>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do
>>>>>>>>>>>>>>>>>>> share your configs as at
>>>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
>>>>>>>>>>>>>>>>>>> for users and to not
>>>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above
>>>>>>>>>>>>>>>>>>> test results. After
>>>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure
>>>>>>>>>>>>>>>>>>> them and do comparison
>>>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1
>>>>>>>>>>>>>>>>>>> cluster are set on the
>>>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance
>>>>>>>>>>>>>>>>>>> tuning on their
>>>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some
>>>>>>>>>>>>>>>>>>> differences:
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct
>>>>>>>>>>>>>>>>>>> cause might be that Yarn
>>>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Unfortunately yes.  I agree that it should be documented.  The only way to
compare fairly is to use the same terasort jar against both.  When I've
done comparisons, I've used the MR2 examples jar against both MR1 and MR2.
 This works on CDH because it has a version of MR1 that's compatible with
MR2, but may require some recompilation if using the apache releases.


On Wed, Oct 23, 2013 at 5:01 PM, Jian Fang <ji...@gmail.com>wrote:

> Really? Does that mean I cannot compare terasort in MR2 with MR1
> directly?  If yes, this really should be documented.
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 4:54 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> I should have brought this up earlier - the Terasort benchmark
>> requirements changed recently to make data less compressible.  The MR2
>> version of Terasort has this change, but the MR1 version does not.    So
>> Snappy should be working fine, but the data that MR2 is using is less
>> compressible.
>>
>>
>> On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Ok, I think I know where the problem is now. I used snappy to compress
>>> map output.
>>>
>>> Calculate the compression ratio = "Map output materialized bytes"/"Map
>>> output bytes"
>>>
>>> MR2
>>> 440519309909/1020000000000 = 0.431881676
>>>
>>> MR1
>>> 240948272514/1000000000000 = 0.240948273
>>>
>>>
>>> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
>>> MR2 needs to fetch much more data during shuffle phase. Any way to check
>>> snappy in MR2?
>>>
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>>>> 4,294,967,296, but 10,000,000,000 for MR2.
>>>> That is to say the number of unique keys fed into the reducers is
>>>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Reducing the number of map containers to 8 slightly improve the total
>>>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>>>> reason why the containers are killed other than the message such as
>>>>> "Container killed by the
>>>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>>>> killed on request. Exit code is 143"
>>>>>
>>>>>
>>>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=455484809066
>>>>>                 FILE: Number of bytes written=896642344343
>>>>>
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000841624
>>>>>
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=25531
>>>>>
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=150
>>>>>
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=11
>>>>>                 Launched map tasks=7449
>>>>>                 Launched reduce tasks=86
>>>>>                 Data-local map tasks=7434
>>>>>                 Rack-local map tasks=15
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1030941232
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=1574732272
>>>>>
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440519309909
>>>>>                 Input split bytes=841624
>>>>>
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440519309909
>>>>>
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =558600
>>>>>                 Failed Shuffles=0
>>>>>                 Merged Map outputs=558600
>>>>>                 GC time elapsed (ms)=2855534
>>>>>                 CPU time spent (ms)=170381420
>>>>>                 Physical memory (bytes) snapshot=3704938954752
>>>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>>>                 Total committed heap usage (bytes)=4523248582656
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=0
>>>>>
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Already started the tests with only 8 containers for map in MR2,
>>>>>> still running.
>>>>>>
>>>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Changing mapreduce.job.reduce.
>>>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>>>> let map phase run faster?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sandy.
>>>>>>>>>
>>>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>>>
>>>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>>>> back to 768MB.
>>>>>>>>>
>>>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>>>> happens. BTW, I should use
>>>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>>>
>>>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could
>>>>>>>>> not find the counterpart for MR2. Which one I should tune for the http
>>>>>>>>> thread?
>>>>>>>>>
>>>>>>>>> Thanks again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are
>>>>>>>>>> taking about three times as long in MR2 as they are in MR1.  This is
>>>>>>>>>> probably because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>>>> phases will run separately.
>>>>>>>>>>
>>>>>>>>>> On the other side, it looks like your MR1 has more spilled
>>>>>>>>>> records than MR2.  For a fairer comparison, you should set
>>>>>>>>>> io.sort.record.percent in MR1 to .13, which should improve MR1 performance,
>>>>>>>>>> but will provide a fairer comparison (MR2 automatically does this tuning
>>>>>>>>>> for you).
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The number of map slots and reduce slots on each data node for
>>>>>>>>>>> MR1 are 8 and 3, respectively. Since MR2 could use all containers for
>>>>>>>>>>> either map or reduce, I would expect that MR2 is faster.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>>>
>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same
>>>>>>>>>>>>> result, i.e., about 90 minutes for MR2. I don't think the killed reduces
>>>>>>>>>>>>> could contribute to 2 times slowness. There should be something very wrong
>>>>>>>>>>>>> either in configuration or code. Any hints?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>>>> No lease on
>>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>>>> any open files.
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO
>>>>>>>>>>>>>>>> org.apache.hadoop.mapreduce.Job (main): Counters: 46
>>>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>>>                 Physical memory (bytes)
>>>>>>>>>>>>>>>> snapshot=3349356380160
>>>>>>>>>>>>>>>>                 Virtual memory (bytes)
>>>>>>>>>>>>>>>> snapshot=9665208745984
>>>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with
>>>>>>>>>>>>>>>>> Hadoop 1.0.3 and it turned out that the terasort for MR2 is 2 times slower
>>>>>>>>>>>>>>>>> than that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count
>>>>>>>>>>>>>>>>> = "64";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services =
>>>>>>>>>>>>>>>>> "mapreduce_shuffle";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count =
>>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...),
>>>>>>>>>>>>>>>>>> but, MRv1 job has special execution process(map > shuffle > reduce) in
>>>>>>>>>>>>>>>>>> Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR
>>>>>>>>>>>>>>>>>> steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of
>>>>>>>>>>>>>>>>>> Yarn to execute a job?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it
>>>>>>>>>>>>>>>>>>> will be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>>> proper to be assigned to containers?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this
>>>>>>>>>>>>>>>>>>> here. Every
>>>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs
>>>>>>>>>>>>>>>>>>> you have there
>>>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests
>>>>>>>>>>>>>>>>>>> from jobs, to
>>>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time
>>>>>>>>>>>>>>>>>>> (runtime change), or
>>>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the
>>>>>>>>>>>>>>>>>>> same (config
>>>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will other
>>>>>>>>>>>>>>>>>>> parameters for mapred-site.xml still work in yarn framework? Like
>>>>>>>>>>>>>>>>>>> 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it
>>>>>>>>>>>>>>>>>>> will be 22 containers due
>>>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>>> proper to be assigned to
>>>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will
>>>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in
>>>>>>>>>>>>>>>>>>> yarn framework? Like
>>>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's
>>>>>>>>>>>>>>>>>>> suggestions? The config
>>>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22
>>>>>>>>>>>>>>>>>>> GB memory
>>>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
>>>>>>>>>>>>>>>>>>> of your tasks getting
>>>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk
>>>>>>>>>>>>>>>>>>> to the Resource
>>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be
>>>>>>>>>>>>>>>>>>> requesting 1 GB minimum by
>>>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at
>>>>>>>>>>>>>>>>>>> 8 (due to 8 GB NM
>>>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do
>>>>>>>>>>>>>>>>>>> share your configs as at
>>>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
>>>>>>>>>>>>>>>>>>> for users and to not
>>>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above
>>>>>>>>>>>>>>>>>>> test results. After
>>>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure
>>>>>>>>>>>>>>>>>>> them and do comparison
>>>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1
>>>>>>>>>>>>>>>>>>> cluster are set on the
>>>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance
>>>>>>>>>>>>>>>>>>> tuning on their
>>>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some
>>>>>>>>>>>>>>>>>>> differences:
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct
>>>>>>>>>>>>>>>>>>> cause might be that Yarn
>>>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Really? Does that mean I cannot compare terasort in MR2 with MR1 directly?
If yes, this really should be documented.

Thanks.


On Wed, Oct 23, 2013 at 4:54 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> I should have brought this up earlier - the Terasort benchmark
> requirements changed recently to make data less compressible.  The MR2
> version of Terasort has this change, but the MR1 version does not.    So
> Snappy should be working fine, but the data that MR2 is using is less
> compressible.
>
>
> On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Ok, I think I know where the problem is now. I used snappy to compress
>> map output.
>>
>> Calculate the compression ratio = "Map output materialized bytes"/"Map
>> output bytes"
>>
>> MR2
>> 440519309909/1020000000000 = 0.431881676
>>
>> MR1
>> 240948272514/1000000000000 = 0.240948273
>>
>>
>> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
>> MR2 needs to fetch much more data during shuffle phase. Any way to check
>> snappy in MR2?
>>
>>
>>
>>
>> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>>> 4,294,967,296, but 10,000,000,000 for MR2.
>>> That is to say the number of unique keys fed into the reducers is
>>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>>
>>>
>>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Reducing the number of map containers to 8 slightly improve the total
>>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>>> reason why the containers are killed other than the message such as
>>>> "Container killed by the
>>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>>> killed on request. Exit code is 143"
>>>>
>>>>
>>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>>> Counters: 46
>>>>         File System Counters
>>>>                 FILE: Number of bytes read=455484809066
>>>>                 FILE: Number of bytes written=896642344343
>>>>
>>>>                 FILE: Number of read operations=0
>>>>                 FILE: Number of large read operations=0
>>>>                 FILE: Number of write operations=0
>>>>                 HDFS: Number of bytes read=1000000841624
>>>>
>>>>                 HDFS: Number of bytes written=1000000000000
>>>>                 HDFS: Number of read operations=25531
>>>>
>>>>                 HDFS: Number of large read operations=0
>>>>                 HDFS: Number of write operations=150
>>>>
>>>>         Job Counters
>>>>                 Killed map tasks=1
>>>>                 Killed reduce tasks=11
>>>>                 Launched map tasks=7449
>>>>                 Launched reduce tasks=86
>>>>                 Data-local map tasks=7434
>>>>                 Rack-local map tasks=15
>>>>                 Total time spent by all maps in occupied slots
>>>> (ms)=1030941232
>>>>                 Total time spent by all reduces in occupied slots
>>>> (ms)=1574732272
>>>>
>>>>         Map-Reduce Framework
>>>>                 Map input records=10000000000
>>>>                 Map output records=10000000000
>>>>                 Map output bytes=1020000000000
>>>>                 Map output materialized bytes=440519309909
>>>>                 Input split bytes=841624
>>>>
>>>>                 Combine input records=0
>>>>                 Combine output records=0
>>>>                 Reduce input groups=10000000000
>>>>                 Reduce shuffle bytes=440519309909
>>>>
>>>>                 Reduce input records=10000000000
>>>>                 Reduce output records=10000000000
>>>>                 Spilled Records=20000000000
>>>>                 Shuffled Maps =558600
>>>>                 Failed Shuffles=0
>>>>                 Merged Map outputs=558600
>>>>                 GC time elapsed (ms)=2855534
>>>>                 CPU time spent (ms)=170381420
>>>>                 Physical memory (bytes) snapshot=3704938954752
>>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>>                 Total committed heap usage (bytes)=4523248582656
>>>>         Shuffle Errors
>>>>                 BAD_ID=0
>>>>                 CONNECTION=0
>>>>                 IO_ERROR=0
>>>>
>>>>                 WRONG_LENGTH=0
>>>>                 WRONG_MAP=0
>>>>                 WRONG_REDUCE=0
>>>>         File Input Format Counters
>>>>                 Bytes Read=1000000000000
>>>>         File Output Format Counters
>>>>                 Bytes Written=1000000000000
>>>>
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Already started the tests with only 8 containers for map in MR2, still
>>>>> running.
>>>>>
>>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Changing mapreduce.job.reduce.
>>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>>> let map phase run faster?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Sandy.
>>>>>>>>
>>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>>
>>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>>> back to 768MB.
>>>>>>>>
>>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>>> happens. BTW, I should use
>>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>>
>>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could
>>>>>>>> not find the counterpart for MR2. Which one I should tune for the http
>>>>>>>> thread?
>>>>>>>>
>>>>>>>> Thanks again.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are
>>>>>>>>> taking about three times as long in MR2 as they are in MR1.  This is
>>>>>>>>> probably because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>>> phases will run separately.
>>>>>>>>>
>>>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The number of map slots and reduce slots on each data node for
>>>>>>>>>> MR1 are 8 and 3, respectively. Since MR2 could use all containers for
>>>>>>>>>> either map or reduce, I would expect that MR2 is faster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>>> No lease on
>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>>> any open files.
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>>> --
>>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>>                 Physical memory (bytes)
>>>>>>>>>>>>>>> snapshot=3349356380160
>>>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with
>>>>>>>>>>>>>>>> Hadoop 1.0.3 and it turned out that the terasort for MR2 is 2 times slower
>>>>>>>>>>>>>>>> than that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count
>>>>>>>>>>>>>>>> = "64";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count =
>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...),
>>>>>>>>>>>>>>>>> but, MRv1 job has special execution process(map > shuffle > reduce) in
>>>>>>>>>>>>>>>>> Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR
>>>>>>>>>>>>>>>>> steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>> proper to be assigned to containers?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this
>>>>>>>>>>>>>>>>>> here. Every
>>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs
>>>>>>>>>>>>>>>>>> you have there
>>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will other
>>>>>>>>>>>>>>>>>> parameters for mapred-site.xml still work in yarn framework? Like
>>>>>>>>>>>>>>>>>> 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>> proper to be assigned to
>>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will
>>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's
>>>>>>>>>>>>>>>>>> suggestions? The config
>>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22
>>>>>>>>>>>>>>>>>> GB memory
>>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
>>>>>>>>>>>>>>>>>> of your tasks getting
>>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk
>>>>>>>>>>>>>>>>>> to the Resource
>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting
>>>>>>>>>>>>>>>>>> 1 GB minimum by
>>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
>>>>>>>>>>>>>>>>>> for users and to not
>>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above
>>>>>>>>>>>>>>>>>> test results. After
>>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1
>>>>>>>>>>>>>>>>>> cluster are set on the
>>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some
>>>>>>>>>>>>>>>>>> differences:
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct
>>>>>>>>>>>>>>>>>> cause might be that Yarn
>>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Really? Does that mean I cannot compare terasort in MR2 with MR1 directly?
If yes, this really should be documented.

Thanks.


On Wed, Oct 23, 2013 at 4:54 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> I should have brought this up earlier - the Terasort benchmark
> requirements changed recently to make data less compressible.  The MR2
> version of Terasort has this change, but the MR1 version does not.    So
> Snappy should be working fine, but the data that MR2 is using is less
> compressible.
>
>
> On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Ok, I think I know where the problem is now. I used snappy to compress
>> map output.
>>
>> Calculate the compression ratio = "Map output materialized bytes"/"Map
>> output bytes"
>>
>> MR2
>> 440519309909/1020000000000 = 0.431881676
>>
>> MR1
>> 240948272514/1000000000000 = 0.240948273
>>
>>
>> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
>> MR2 needs to fetch much more data during shuffle phase. Any way to check
>> snappy in MR2?
>>
>>
>>
>>
>> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>>> 4,294,967,296, but 10,000,000,000 for MR2.
>>> That is to say the number of unique keys fed into the reducers is
>>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>>
>>>
>>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Reducing the number of map containers to 8 slightly improve the total
>>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>>> reason why the containers are killed other than the message such as
>>>> "Container killed by the
>>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>>> killed on request. Exit code is 143"
>>>>
>>>>
>>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>>> Counters: 46
>>>>         File System Counters
>>>>                 FILE: Number of bytes read=455484809066
>>>>                 FILE: Number of bytes written=896642344343
>>>>
>>>>                 FILE: Number of read operations=0
>>>>                 FILE: Number of large read operations=0
>>>>                 FILE: Number of write operations=0
>>>>                 HDFS: Number of bytes read=1000000841624
>>>>
>>>>                 HDFS: Number of bytes written=1000000000000
>>>>                 HDFS: Number of read operations=25531
>>>>
>>>>                 HDFS: Number of large read operations=0
>>>>                 HDFS: Number of write operations=150
>>>>
>>>>         Job Counters
>>>>                 Killed map tasks=1
>>>>                 Killed reduce tasks=11
>>>>                 Launched map tasks=7449
>>>>                 Launched reduce tasks=86
>>>>                 Data-local map tasks=7434
>>>>                 Rack-local map tasks=15
>>>>                 Total time spent by all maps in occupied slots
>>>> (ms)=1030941232
>>>>                 Total time spent by all reduces in occupied slots
>>>> (ms)=1574732272
>>>>
>>>>         Map-Reduce Framework
>>>>                 Map input records=10000000000
>>>>                 Map output records=10000000000
>>>>                 Map output bytes=1020000000000
>>>>                 Map output materialized bytes=440519309909
>>>>                 Input split bytes=841624
>>>>
>>>>                 Combine input records=0
>>>>                 Combine output records=0
>>>>                 Reduce input groups=10000000000
>>>>                 Reduce shuffle bytes=440519309909
>>>>
>>>>                 Reduce input records=10000000000
>>>>                 Reduce output records=10000000000
>>>>                 Spilled Records=20000000000
>>>>                 Shuffled Maps =558600
>>>>                 Failed Shuffles=0
>>>>                 Merged Map outputs=558600
>>>>                 GC time elapsed (ms)=2855534
>>>>                 CPU time spent (ms)=170381420
>>>>                 Physical memory (bytes) snapshot=3704938954752
>>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>>                 Total committed heap usage (bytes)=4523248582656
>>>>         Shuffle Errors
>>>>                 BAD_ID=0
>>>>                 CONNECTION=0
>>>>                 IO_ERROR=0
>>>>
>>>>                 WRONG_LENGTH=0
>>>>                 WRONG_MAP=0
>>>>                 WRONG_REDUCE=0
>>>>         File Input Format Counters
>>>>                 Bytes Read=1000000000000
>>>>         File Output Format Counters
>>>>                 Bytes Written=1000000000000
>>>>
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Already started the tests with only 8 containers for map in MR2, still
>>>>> running.
>>>>>
>>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Changing mapreduce.job.reduce.
>>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>>> let map phase run faster?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Sandy.
>>>>>>>>
>>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>>
>>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>>> back to 768MB.
>>>>>>>>
>>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>>> happens. BTW, I should use
>>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>>
>>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could
>>>>>>>> not find the counterpart for MR2. Which one I should tune for the http
>>>>>>>> thread?
>>>>>>>>
>>>>>>>> Thanks again.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are
>>>>>>>>> taking about three times as long in MR2 as they are in MR1.  This is
>>>>>>>>> probably because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>>> phases will run separately.
>>>>>>>>>
>>>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The number of map slots and reduce slots on each data node for
>>>>>>>>>> MR1 are 8 and 3, respectively. Since MR2 could use all containers for
>>>>>>>>>> either map or reduce, I would expect that MR2 is faster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>>> No lease on
>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>>> any open files.
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>>> --
>>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>>                 Physical memory (bytes)
>>>>>>>>>>>>>>> snapshot=3349356380160
>>>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with
>>>>>>>>>>>>>>>> Hadoop 1.0.3 and it turned out that the terasort for MR2 is 2 times slower
>>>>>>>>>>>>>>>> than that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count
>>>>>>>>>>>>>>>> = "64";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count =
>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...),
>>>>>>>>>>>>>>>>> but, MRv1 job has special execution process(map > shuffle > reduce) in
>>>>>>>>>>>>>>>>> Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR
>>>>>>>>>>>>>>>>> steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>> proper to be assigned to containers?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this
>>>>>>>>>>>>>>>>>> here. Every
>>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs
>>>>>>>>>>>>>>>>>> you have there
>>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will other
>>>>>>>>>>>>>>>>>> parameters for mapred-site.xml still work in yarn framework? Like
>>>>>>>>>>>>>>>>>> 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>> proper to be assigned to
>>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will
>>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's
>>>>>>>>>>>>>>>>>> suggestions? The config
>>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22
>>>>>>>>>>>>>>>>>> GB memory
>>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
>>>>>>>>>>>>>>>>>> of your tasks getting
>>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk
>>>>>>>>>>>>>>>>>> to the Resource
>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting
>>>>>>>>>>>>>>>>>> 1 GB minimum by
>>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
>>>>>>>>>>>>>>>>>> for users and to not
>>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above
>>>>>>>>>>>>>>>>>> test results. After
>>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1
>>>>>>>>>>>>>>>>>> cluster are set on the
>>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some
>>>>>>>>>>>>>>>>>> differences:
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct
>>>>>>>>>>>>>>>>>> cause might be that Yarn
>>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Really? Does that mean I cannot compare terasort in MR2 with MR1 directly?
If yes, this really should be documented.

Thanks.


On Wed, Oct 23, 2013 at 4:54 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> I should have brought this up earlier - the Terasort benchmark
> requirements changed recently to make data less compressible.  The MR2
> version of Terasort has this change, but the MR1 version does not.    So
> Snappy should be working fine, but the data that MR2 is using is less
> compressible.
>
>
> On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Ok, I think I know where the problem is now. I used snappy to compress
>> map output.
>>
>> Calculate the compression ratio = "Map output materialized bytes"/"Map
>> output bytes"
>>
>> MR2
>> 440519309909/1020000000000 = 0.431881676
>>
>> MR1
>> 240948272514/1000000000000 = 0.240948273
>>
>>
>> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
>> MR2 needs to fetch much more data during shuffle phase. Any way to check
>> snappy in MR2?
>>
>>
>>
>>
>> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>>> 4,294,967,296, but 10,000,000,000 for MR2.
>>> That is to say the number of unique keys fed into the reducers is
>>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>>
>>>
>>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Reducing the number of map containers to 8 slightly improve the total
>>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>>> reason why the containers are killed other than the message such as
>>>> "Container killed by the
>>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>>> killed on request. Exit code is 143"
>>>>
>>>>
>>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>>> Counters: 46
>>>>         File System Counters
>>>>                 FILE: Number of bytes read=455484809066
>>>>                 FILE: Number of bytes written=896642344343
>>>>
>>>>                 FILE: Number of read operations=0
>>>>                 FILE: Number of large read operations=0
>>>>                 FILE: Number of write operations=0
>>>>                 HDFS: Number of bytes read=1000000841624
>>>>
>>>>                 HDFS: Number of bytes written=1000000000000
>>>>                 HDFS: Number of read operations=25531
>>>>
>>>>                 HDFS: Number of large read operations=0
>>>>                 HDFS: Number of write operations=150
>>>>
>>>>         Job Counters
>>>>                 Killed map tasks=1
>>>>                 Killed reduce tasks=11
>>>>                 Launched map tasks=7449
>>>>                 Launched reduce tasks=86
>>>>                 Data-local map tasks=7434
>>>>                 Rack-local map tasks=15
>>>>                 Total time spent by all maps in occupied slots
>>>> (ms)=1030941232
>>>>                 Total time spent by all reduces in occupied slots
>>>> (ms)=1574732272
>>>>
>>>>         Map-Reduce Framework
>>>>                 Map input records=10000000000
>>>>                 Map output records=10000000000
>>>>                 Map output bytes=1020000000000
>>>>                 Map output materialized bytes=440519309909
>>>>                 Input split bytes=841624
>>>>
>>>>                 Combine input records=0
>>>>                 Combine output records=0
>>>>                 Reduce input groups=10000000000
>>>>                 Reduce shuffle bytes=440519309909
>>>>
>>>>                 Reduce input records=10000000000
>>>>                 Reduce output records=10000000000
>>>>                 Spilled Records=20000000000
>>>>                 Shuffled Maps =558600
>>>>                 Failed Shuffles=0
>>>>                 Merged Map outputs=558600
>>>>                 GC time elapsed (ms)=2855534
>>>>                 CPU time spent (ms)=170381420
>>>>                 Physical memory (bytes) snapshot=3704938954752
>>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>>                 Total committed heap usage (bytes)=4523248582656
>>>>         Shuffle Errors
>>>>                 BAD_ID=0
>>>>                 CONNECTION=0
>>>>                 IO_ERROR=0
>>>>
>>>>                 WRONG_LENGTH=0
>>>>                 WRONG_MAP=0
>>>>                 WRONG_REDUCE=0
>>>>         File Input Format Counters
>>>>                 Bytes Read=1000000000000
>>>>         File Output Format Counters
>>>>                 Bytes Written=1000000000000
>>>>
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Already started the tests with only 8 containers for map in MR2, still
>>>>> running.
>>>>>
>>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Changing mapreduce.job.reduce.
>>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>>> let map phase run faster?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Sandy.
>>>>>>>>
>>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>>
>>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>>> back to 768MB.
>>>>>>>>
>>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>>> happens. BTW, I should use
>>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>>
>>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could
>>>>>>>> not find the counterpart for MR2. Which one I should tune for the http
>>>>>>>> thread?
>>>>>>>>
>>>>>>>> Thanks again.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are
>>>>>>>>> taking about three times as long in MR2 as they are in MR1.  This is
>>>>>>>>> probably because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>>> phases will run separately.
>>>>>>>>>
>>>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The number of map slots and reduce slots on each data node for
>>>>>>>>>> MR1 are 8 and 3, respectively. Since MR2 could use all containers for
>>>>>>>>>> either map or reduce, I would expect that MR2 is faster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>>> No lease on
>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>>> any open files.
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>>> --
>>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>>                 Physical memory (bytes)
>>>>>>>>>>>>>>> snapshot=3349356380160
>>>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with
>>>>>>>>>>>>>>>> Hadoop 1.0.3 and it turned out that the terasort for MR2 is 2 times slower
>>>>>>>>>>>>>>>> than that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count
>>>>>>>>>>>>>>>> = "64";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count =
>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...),
>>>>>>>>>>>>>>>>> but, MRv1 job has special execution process(map > shuffle > reduce) in
>>>>>>>>>>>>>>>>> Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR
>>>>>>>>>>>>>>>>> steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>> proper to be assigned to containers?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this
>>>>>>>>>>>>>>>>>> here. Every
>>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs
>>>>>>>>>>>>>>>>>> you have there
>>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will other
>>>>>>>>>>>>>>>>>> parameters for mapred-site.xml still work in yarn framework? Like
>>>>>>>>>>>>>>>>>> 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>> proper to be assigned to
>>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will
>>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's
>>>>>>>>>>>>>>>>>> suggestions? The config
>>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22
>>>>>>>>>>>>>>>>>> GB memory
>>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
>>>>>>>>>>>>>>>>>> of your tasks getting
>>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk
>>>>>>>>>>>>>>>>>> to the Resource
>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting
>>>>>>>>>>>>>>>>>> 1 GB minimum by
>>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
>>>>>>>>>>>>>>>>>> for users and to not
>>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above
>>>>>>>>>>>>>>>>>> test results. After
>>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1
>>>>>>>>>>>>>>>>>> cluster are set on the
>>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some
>>>>>>>>>>>>>>>>>> differences:
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct
>>>>>>>>>>>>>>>>>> cause might be that Yarn
>>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Really? Does that mean I cannot compare terasort in MR2 with MR1 directly?
If yes, this really should be documented.

Thanks.


On Wed, Oct 23, 2013 at 4:54 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> I should have brought this up earlier - the Terasort benchmark
> requirements changed recently to make data less compressible.  The MR2
> version of Terasort has this change, but the MR1 version does not.    So
> Snappy should be working fine, but the data that MR2 is using is less
> compressible.
>
>
> On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Ok, I think I know where the problem is now. I used snappy to compress
>> map output.
>>
>> Calculate the compression ratio = "Map output materialized bytes"/"Map
>> output bytes"
>>
>> MR2
>> 440519309909/1020000000000 = 0.431881676
>>
>> MR1
>> 240948272514/1000000000000 = 0.240948273
>>
>>
>> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
>> MR2 needs to fetch much more data during shuffle phase. Any way to check
>> snappy in MR2?
>>
>>
>>
>>
>> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>>> 4,294,967,296, but 10,000,000,000 for MR2.
>>> That is to say the number of unique keys fed into the reducers is
>>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>>
>>>
>>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Reducing the number of map containers to 8 slightly improve the total
>>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>>> reason why the containers are killed other than the message such as
>>>> "Container killed by the
>>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>>> killed on request. Exit code is 143"
>>>>
>>>>
>>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>>> Counters: 46
>>>>         File System Counters
>>>>                 FILE: Number of bytes read=455484809066
>>>>                 FILE: Number of bytes written=896642344343
>>>>
>>>>                 FILE: Number of read operations=0
>>>>                 FILE: Number of large read operations=0
>>>>                 FILE: Number of write operations=0
>>>>                 HDFS: Number of bytes read=1000000841624
>>>>
>>>>                 HDFS: Number of bytes written=1000000000000
>>>>                 HDFS: Number of read operations=25531
>>>>
>>>>                 HDFS: Number of large read operations=0
>>>>                 HDFS: Number of write operations=150
>>>>
>>>>         Job Counters
>>>>                 Killed map tasks=1
>>>>                 Killed reduce tasks=11
>>>>                 Launched map tasks=7449
>>>>                 Launched reduce tasks=86
>>>>                 Data-local map tasks=7434
>>>>                 Rack-local map tasks=15
>>>>                 Total time spent by all maps in occupied slots
>>>> (ms)=1030941232
>>>>                 Total time spent by all reduces in occupied slots
>>>> (ms)=1574732272
>>>>
>>>>         Map-Reduce Framework
>>>>                 Map input records=10000000000
>>>>                 Map output records=10000000000
>>>>                 Map output bytes=1020000000000
>>>>                 Map output materialized bytes=440519309909
>>>>                 Input split bytes=841624
>>>>
>>>>                 Combine input records=0
>>>>                 Combine output records=0
>>>>                 Reduce input groups=10000000000
>>>>                 Reduce shuffle bytes=440519309909
>>>>
>>>>                 Reduce input records=10000000000
>>>>                 Reduce output records=10000000000
>>>>                 Spilled Records=20000000000
>>>>                 Shuffled Maps =558600
>>>>                 Failed Shuffles=0
>>>>                 Merged Map outputs=558600
>>>>                 GC time elapsed (ms)=2855534
>>>>                 CPU time spent (ms)=170381420
>>>>                 Physical memory (bytes) snapshot=3704938954752
>>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>>                 Total committed heap usage (bytes)=4523248582656
>>>>         Shuffle Errors
>>>>                 BAD_ID=0
>>>>                 CONNECTION=0
>>>>                 IO_ERROR=0
>>>>
>>>>                 WRONG_LENGTH=0
>>>>                 WRONG_MAP=0
>>>>                 WRONG_REDUCE=0
>>>>         File Input Format Counters
>>>>                 Bytes Read=1000000000000
>>>>         File Output Format Counters
>>>>                 Bytes Written=1000000000000
>>>>
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Already started the tests with only 8 containers for map in MR2, still
>>>>> running.
>>>>>
>>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Changing mapreduce.job.reduce.
>>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>>> let map phase run faster?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Sandy.
>>>>>>>>
>>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>>
>>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>>> back to 768MB.
>>>>>>>>
>>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>>> happens. BTW, I should use
>>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>>
>>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could
>>>>>>>> not find the counterpart for MR2. Which one I should tune for the http
>>>>>>>> thread?
>>>>>>>>
>>>>>>>> Thanks again.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are
>>>>>>>>> taking about three times as long in MR2 as they are in MR1.  This is
>>>>>>>>> probably because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>>> phases will run separately.
>>>>>>>>>
>>>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The number of map slots and reduce slots on each data node for
>>>>>>>>>> MR1 are 8 and 3, respectively. Since MR2 could use all containers for
>>>>>>>>>> either map or reduce, I would expect that MR2 is faster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>>> happens.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>>> instance.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>>> No lease on
>>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>>> any open files.
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>>> --
>>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>>                 Physical memory (bytes)
>>>>>>>>>>>>>>> snapshot=3349356380160
>>>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with
>>>>>>>>>>>>>>>> Hadoop 1.0.3 and it turned out that the terasort for MR2 is 2 times slower
>>>>>>>>>>>>>>>> than that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count
>>>>>>>>>>>>>>>> = "64";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count =
>>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...),
>>>>>>>>>>>>>>>>> but, MRv1 job has special execution process(map > shuffle > reduce) in
>>>>>>>>>>>>>>>>> Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR
>>>>>>>>>>>>>>>>> steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>> proper to be assigned to containers?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this
>>>>>>>>>>>>>>>>>> here. Every
>>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs
>>>>>>>>>>>>>>>>>> you have there
>>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will other
>>>>>>>>>>>>>>>>>> parameters for mapred-site.xml still work in yarn framework? Like
>>>>>>>>>>>>>>>>>> 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is
>>>>>>>>>>>>>>>>>> proper to be assigned to
>>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set '
>>>>>>>>>>>>>>>>>> mapreduce.framework.name' to be 'yarn', will
>>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's
>>>>>>>>>>>>>>>>>> suggestions? The config
>>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22
>>>>>>>>>>>>>>>>>> GB memory
>>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
>>>>>>>>>>>>>>>>>> of your tasks getting
>>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk
>>>>>>>>>>>>>>>>>> to the Resource
>>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting
>>>>>>>>>>>>>>>>>> 1 GB minimum by
>>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
>>>>>>>>>>>>>>>>>> for users and to not
>>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above
>>>>>>>>>>>>>>>>>> test results. After
>>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1
>>>>>>>>>>>>>>>>>> cluster are set on the
>>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some
>>>>>>>>>>>>>>>>>> differences:
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct
>>>>>>>>>>>>>>>>>> cause might be that Yarn
>>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

I should have brought this up earlier - the Terasort benchmark requirements
changed recently to make data less compressible.  The MR2 version of
Terasort has this change, but the MR1 version does not.    So Snappy should
be working fine, but the data that MR2 is using is less compressible.


On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <ji...@gmail.com>wrote:

> Ok, I think I know where the problem is now. I used snappy to compress map
> output.
>
> Calculate the compression ratio = "Map output materialized bytes"/"Map
> output bytes"
>
> MR2
> 440519309909/1020000000000 = 0.431881676
>
> MR1
> 240948272514/1000000000000 = 0.240948273
>
>
> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
> MR2 needs to fetch much more data during shuffle phase. Any way to check
> snappy in MR2?
>
>
>
>
> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>> 4,294,967,296, but 10,000,000,000 for MR2.
>> That is to say the number of unique keys fed into the reducers is
>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>
>>
>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Reducing the number of map containers to 8 slightly improve the total
>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>> reason why the containers are killed other than the message such as
>>> "Container killed by the
>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>> killed on request. Exit code is 143"
>>>
>>>
>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=455484809066
>>>                 FILE: Number of bytes written=896642344343
>>>
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000841624
>>>
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=25531
>>>
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=150
>>>
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=11
>>>                 Launched map tasks=7449
>>>                 Launched reduce tasks=86
>>>                 Data-local map tasks=7434
>>>                 Rack-local map tasks=15
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1030941232
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=1574732272
>>>
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440519309909
>>>                 Input split bytes=841624
>>>
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440519309909
>>>
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =558600
>>>                 Failed Shuffles=0
>>>                 Merged Map outputs=558600
>>>                 GC time elapsed (ms)=2855534
>>>                 CPU time spent (ms)=170381420
>>>                 Physical memory (bytes) snapshot=3704938954752
>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>                 Total committed heap usage (bytes)=4523248582656
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=0
>>>
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Already started the tests with only 8 containers for map in MR2, still
>>>> running.
>>>>
>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Changing mapreduce.job.reduce.
>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>> let map phase run faster?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy.
>>>>>>>
>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>
>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>> back to 768MB.
>>>>>>>
>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>> happens. BTW, I should use
>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>
>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>>>
>>>>>>> Thanks again.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>> phases will run separately.
>>>>>>>>
>>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>> happens.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>> instance.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>> No lease on
>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>> any open files.
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>> --
>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>         at
>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count
>>>>>>>>>>>>>>> = "60";
>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting
>>>>>>>>>>>>>>>>> 1 GB minimum by
>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

I should have brought this up earlier - the Terasort benchmark requirements
changed recently to make data less compressible.  The MR2 version of
Terasort has this change, but the MR1 version does not.    So Snappy should
be working fine, but the data that MR2 is using is less compressible.


On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <ji...@gmail.com>wrote:

> Ok, I think I know where the problem is now. I used snappy to compress map
> output.
>
> Calculate the compression ratio = "Map output materialized bytes"/"Map
> output bytes"
>
> MR2
> 440519309909/1020000000000 = 0.431881676
>
> MR1
> 240948272514/1000000000000 = 0.240948273
>
>
> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
> MR2 needs to fetch much more data during shuffle phase. Any way to check
> snappy in MR2?
>
>
>
>
> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>> 4,294,967,296, but 10,000,000,000 for MR2.
>> That is to say the number of unique keys fed into the reducers is
>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>
>>
>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Reducing the number of map containers to 8 slightly improve the total
>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>> reason why the containers are killed other than the message such as
>>> "Container killed by the
>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>> killed on request. Exit code is 143"
>>>
>>>
>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=455484809066
>>>                 FILE: Number of bytes written=896642344343
>>>
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000841624
>>>
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=25531
>>>
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=150
>>>
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=11
>>>                 Launched map tasks=7449
>>>                 Launched reduce tasks=86
>>>                 Data-local map tasks=7434
>>>                 Rack-local map tasks=15
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1030941232
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=1574732272
>>>
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440519309909
>>>                 Input split bytes=841624
>>>
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440519309909
>>>
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =558600
>>>                 Failed Shuffles=0
>>>                 Merged Map outputs=558600
>>>                 GC time elapsed (ms)=2855534
>>>                 CPU time spent (ms)=170381420
>>>                 Physical memory (bytes) snapshot=3704938954752
>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>                 Total committed heap usage (bytes)=4523248582656
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=0
>>>
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Already started the tests with only 8 containers for map in MR2, still
>>>> running.
>>>>
>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Changing mapreduce.job.reduce.
>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>> let map phase run faster?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy.
>>>>>>>
>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>
>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>> back to 768MB.
>>>>>>>
>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>> happens. BTW, I should use
>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>
>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>>>
>>>>>>> Thanks again.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>> phases will run separately.
>>>>>>>>
>>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>> happens.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>> instance.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>> No lease on
>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>> any open files.
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>> --
>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>         at
>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count
>>>>>>>>>>>>>>> = "60";
>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting
>>>>>>>>>>>>>>>>> 1 GB minimum by
>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

I should have brought this up earlier - the Terasort benchmark requirements
changed recently to make data less compressible.  The MR2 version of
Terasort has this change, but the MR1 version does not.    So Snappy should
be working fine, but the data that MR2 is using is less compressible.


On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <ji...@gmail.com>wrote:

> Ok, I think I know where the problem is now. I used snappy to compress map
> output.
>
> Calculate the compression ratio = "Map output materialized bytes"/"Map
> output bytes"
>
> MR2
> 440519309909/1020000000000 = 0.431881676
>
> MR1
> 240948272514/1000000000000 = 0.240948273
>
>
> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
> MR2 needs to fetch much more data during shuffle phase. Any way to check
> snappy in MR2?
>
>
>
>
> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>> 4,294,967,296, but 10,000,000,000 for MR2.
>> That is to say the number of unique keys fed into the reducers is
>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>
>>
>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Reducing the number of map containers to 8 slightly improve the total
>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>> reason why the containers are killed other than the message such as
>>> "Container killed by the
>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>> killed on request. Exit code is 143"
>>>
>>>
>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=455484809066
>>>                 FILE: Number of bytes written=896642344343
>>>
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000841624
>>>
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=25531
>>>
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=150
>>>
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=11
>>>                 Launched map tasks=7449
>>>                 Launched reduce tasks=86
>>>                 Data-local map tasks=7434
>>>                 Rack-local map tasks=15
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1030941232
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=1574732272
>>>
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440519309909
>>>                 Input split bytes=841624
>>>
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440519309909
>>>
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =558600
>>>                 Failed Shuffles=0
>>>                 Merged Map outputs=558600
>>>                 GC time elapsed (ms)=2855534
>>>                 CPU time spent (ms)=170381420
>>>                 Physical memory (bytes) snapshot=3704938954752
>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>                 Total committed heap usage (bytes)=4523248582656
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=0
>>>
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Already started the tests with only 8 containers for map in MR2, still
>>>> running.
>>>>
>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Changing mapreduce.job.reduce.
>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>> let map phase run faster?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy.
>>>>>>>
>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>
>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>> back to 768MB.
>>>>>>>
>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>> happens. BTW, I should use
>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>
>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>>>
>>>>>>> Thanks again.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>> phases will run separately.
>>>>>>>>
>>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>> happens.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>> instance.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>> No lease on
>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>> any open files.
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>> --
>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>         at
>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count
>>>>>>>>>>>>>>> = "60";
>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting
>>>>>>>>>>>>>>>>> 1 GB minimum by
>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

I should have brought this up earlier - the Terasort benchmark requirements
changed recently to make data less compressible.  The MR2 version of
Terasort has this change, but the MR1 version does not.    So Snappy should
be working fine, but the data that MR2 is using is less compressible.


On Wed, Oct 23, 2013 at 4:48 PM, Jian Fang <ji...@gmail.com>wrote:

> Ok, I think I know where the problem is now. I used snappy to compress map
> output.
>
> Calculate the compression ratio = "Map output materialized bytes"/"Map
> output bytes"
>
> MR2
> 440519309909/1020000000000 = 0.431881676
>
> MR1
> 240948272514/1000000000000 = 0.240948273
>
>
> Seems something is wrong with the snappy in my Hadoop 2 and as a result,
> MR2 needs to fetch much more data during shuffle phase. Any way to check
> snappy in MR2?
>
>
>
>
> On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
>> 4,294,967,296, but 10,000,000,000 for MR2.
>> That is to say the number of unique keys fed into the reducers is
>> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>>
>>
>> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Reducing the number of map containers to 8 slightly improve the total
>>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>>> reason why the containers are killed other than the message such as
>>> "Container killed by the
>>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>>> killed on request. Exit code is 143"
>>>
>>>
>>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=455484809066
>>>                 FILE: Number of bytes written=896642344343
>>>
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000841624
>>>
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=25531
>>>
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=150
>>>
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=11
>>>                 Launched map tasks=7449
>>>                 Launched reduce tasks=86
>>>                 Data-local map tasks=7434
>>>                 Rack-local map tasks=15
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1030941232
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=1574732272
>>>
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440519309909
>>>                 Input split bytes=841624
>>>
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440519309909
>>>
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =558600
>>>                 Failed Shuffles=0
>>>                 Merged Map outputs=558600
>>>                 GC time elapsed (ms)=2855534
>>>                 CPU time spent (ms)=170381420
>>>                 Physical memory (bytes) snapshot=3704938954752
>>>                 Virtual memory (bytes) snapshot=11455208308736
>>>                 Total committed heap usage (bytes)=4523248582656
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=0
>>>
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Already started the tests with only 8 containers for map in MR2, still
>>>> running.
>>>>
>>>> But shouldn't YARN make better use of the memory? If I map YARN
>>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>>
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> Increasing the slowstart is not meant to increase performance, but
>>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Changing mapreduce.job.reduce.
>>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>>> let map phase run faster?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy.
>>>>>>>
>>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>>> in MR1 both use the default value 0.05.
>>>>>>>
>>>>>>> I tried to allocate 1536MB and 1024MB to map container some time
>>>>>>> ago, but the changes did not give me a better result, thus, I changed it
>>>>>>> back to 768MB.
>>>>>>>
>>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>>> happens. BTW, I should use
>>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>>
>>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>>>
>>>>>>> Thanks again.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>>> phases will run separately.
>>>>>>>>
>>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>>> happens.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>>> instance.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>>> No lease on
>>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>>> any open files.
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>>> --
>>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>>         at
>>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>>                 Total time spent by all maps in occupied
>>>>>>>>>>>>>> slots (ms)=1696141311
>>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop
>>>>>>>>>>>>>>> 2.2.0 cluster configurations are as follows.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count
>>>>>>>>>>>>>>> = "60";
>>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count =
>>>>>>>>>>>>>>> "20";
>>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of
>>>>>>>>>>>>>>>> my Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by
>>>>>>>>>>>>>>>> setting 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource
>>>>>>>>>>>>>>>>> each for Map
>>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword
>>>>>>>>>>>>>>>>> in the config
>>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > According to above suggestions, I removed the
>>>>>>>>>>>>>>>>> duplication of setting, and
>>>>>>>>>>>>>>>>> > reduce the value of
>>>>>>>>>>>>>>>>> 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and
>>>>>>>>>>>>>>>>> 12000. Ant then, the
>>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is
>>>>>>>>>>>>>>>>> still the same for
>>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?)
>>>>>>>>>>>>>>>>> close to the default
>>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>> resource manager and
>>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>>> >> >>     <description>the directories used by
>>>>>>>>>>>>>>>>> Nodemanagers as log
>>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be
>>>>>>>>>>>>>>>>> set for Map Reduce to
>>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN
>>>>>>>>>>>>>>>>> uses memory resource
>>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting
>>>>>>>>>>>>>>>>> 1 GB minimum by
>>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials
>>>>>>>>>>>>>>>>> on Yarn performance
>>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size,
>>>>>>>>>>>>>>>>> etc
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or
>>>>>>>>>>>>>>>>> not in the near
>>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node
>>>>>>>>>>>>>>>>> has similar hardware:
>>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much
>>>>>>>>>>>>>>>>> better than MRv1, if
>>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1
>>>>>>>>>>>>>>>>> for terasort?
>>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Ok, I think I know where the problem is now. I used snappy to compress map
output.

Calculate the compression ratio = "Map output materialized bytes"/"Map
output bytes"

MR2
440519309909/1020000000000 = 0.431881676

MR1
240948272514/1000000000000 = 0.240948273


Seems something is wrong with the snappy in my Hadoop 2 and as a result,
MR2 needs to fetch much more data during shuffle phase. Any way to check
snappy in MR2?




On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <ji...@gmail.com>wrote:

> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
> 4,294,967,296, but 10,000,000,000 for MR2.
> That is to say the number of unique keys fed into the reducers is
> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>
>
> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Reducing the number of map containers to 8 slightly improve the total
>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>> reason why the containers are killed other than the message such as
>> "Container killed by the
>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>> killed on request. Exit code is 143"
>>
>>
>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=455484809066
>>                 FILE: Number of bytes written=896642344343
>>
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000841624
>>
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=25531
>>
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=150
>>
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=11
>>                 Launched map tasks=7449
>>                 Launched reduce tasks=86
>>                 Data-local map tasks=7434
>>                 Rack-local map tasks=15
>>                 Total time spent by all maps in occupied slots
>> (ms)=1030941232
>>                 Total time spent by all reduces in occupied slots
>> (ms)=1574732272
>>
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440519309909
>>                 Input split bytes=841624
>>
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440519309909
>>
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =558600
>>                 Failed Shuffles=0
>>                 Merged Map outputs=558600
>>                 GC time elapsed (ms)=2855534
>>                 CPU time spent (ms)=170381420
>>                 Physical memory (bytes) snapshot=3704938954752
>>                 Virtual memory (bytes) snapshot=11455208308736
>>                 Total committed heap usage (bytes)=4523248582656
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=0
>>
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>>
>>
>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Already started the tests with only 8 containers for map in MR2, still
>>> running.
>>>
>>> But shouldn't YARN make better use of the memory? If I map YARN
>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Increasing the slowstart is not meant to increase performance, but
>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Changing mapreduce.job.reduce.
>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>> let map phase run faster?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Thanks Sandy.
>>>>>>
>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>> in MR1 both use the default value 0.05.
>>>>>>
>>>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>>>> but the changes did not give me a better result, thus, I changed it back to
>>>>>> 768MB.
>>>>>>
>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>> happens. BTW, I should use
>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>
>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>>
>>>>>> Thanks again.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>> phases will run separately.
>>>>>>>
>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>> happens.
>>>>>>>>>>>
>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>> instance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>> No lease on
>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>> any open files.
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>> --
>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>         at
>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>
>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count
>>>>>>>>>>>>>> = "60";
>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class
>>>>>>>>>>>>>> = "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Ok, I think I know where the problem is now. I used snappy to compress map
output.

Calculate the compression ratio = "Map output materialized bytes"/"Map
output bytes"

MR2
440519309909/1020000000000 = 0.431881676

MR1
240948272514/1000000000000 = 0.240948273


Seems something is wrong with the snappy in my Hadoop 2 and as a result,
MR2 needs to fetch much more data during shuffle phase. Any way to check
snappy in MR2?




On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <ji...@gmail.com>wrote:

> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
> 4,294,967,296, but 10,000,000,000 for MR2.
> That is to say the number of unique keys fed into the reducers is
> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>
>
> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Reducing the number of map containers to 8 slightly improve the total
>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>> reason why the containers are killed other than the message such as
>> "Container killed by the
>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>> killed on request. Exit code is 143"
>>
>>
>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=455484809066
>>                 FILE: Number of bytes written=896642344343
>>
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000841624
>>
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=25531
>>
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=150
>>
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=11
>>                 Launched map tasks=7449
>>                 Launched reduce tasks=86
>>                 Data-local map tasks=7434
>>                 Rack-local map tasks=15
>>                 Total time spent by all maps in occupied slots
>> (ms)=1030941232
>>                 Total time spent by all reduces in occupied slots
>> (ms)=1574732272
>>
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440519309909
>>                 Input split bytes=841624
>>
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440519309909
>>
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =558600
>>                 Failed Shuffles=0
>>                 Merged Map outputs=558600
>>                 GC time elapsed (ms)=2855534
>>                 CPU time spent (ms)=170381420
>>                 Physical memory (bytes) snapshot=3704938954752
>>                 Virtual memory (bytes) snapshot=11455208308736
>>                 Total committed heap usage (bytes)=4523248582656
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=0
>>
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>>
>>
>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Already started the tests with only 8 containers for map in MR2, still
>>> running.
>>>
>>> But shouldn't YARN make better use of the memory? If I map YARN
>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Increasing the slowstart is not meant to increase performance, but
>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Changing mapreduce.job.reduce.
>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>> let map phase run faster?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Thanks Sandy.
>>>>>>
>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>> in MR1 both use the default value 0.05.
>>>>>>
>>>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>>>> but the changes did not give me a better result, thus, I changed it back to
>>>>>> 768MB.
>>>>>>
>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>> happens. BTW, I should use
>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>
>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>>
>>>>>> Thanks again.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>> phases will run separately.
>>>>>>>
>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>> happens.
>>>>>>>>>>>
>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>> instance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>> No lease on
>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>> any open files.
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>> --
>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>         at
>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>
>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count
>>>>>>>>>>>>>> = "60";
>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class
>>>>>>>>>>>>>> = "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Ok, I think I know where the problem is now. I used snappy to compress map
output.

Calculate the compression ratio = "Map output materialized bytes"/"Map
output bytes"

MR2
440519309909/1020000000000 = 0.431881676

MR1
240948272514/1000000000000 = 0.240948273


Seems something is wrong with the snappy in my Hadoop 2 and as a result,
MR2 needs to fetch much more data during shuffle phase. Any way to check
snappy in MR2?




On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <ji...@gmail.com>wrote:

> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
> 4,294,967,296, but 10,000,000,000 for MR2.
> That is to say the number of unique keys fed into the reducers is
> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>
>
> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Reducing the number of map containers to 8 slightly improve the total
>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>> reason why the containers are killed other than the message such as
>> "Container killed by the
>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>> killed on request. Exit code is 143"
>>
>>
>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=455484809066
>>                 FILE: Number of bytes written=896642344343
>>
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000841624
>>
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=25531
>>
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=150
>>
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=11
>>                 Launched map tasks=7449
>>                 Launched reduce tasks=86
>>                 Data-local map tasks=7434
>>                 Rack-local map tasks=15
>>                 Total time spent by all maps in occupied slots
>> (ms)=1030941232
>>                 Total time spent by all reduces in occupied slots
>> (ms)=1574732272
>>
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440519309909
>>                 Input split bytes=841624
>>
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440519309909
>>
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =558600
>>                 Failed Shuffles=0
>>                 Merged Map outputs=558600
>>                 GC time elapsed (ms)=2855534
>>                 CPU time spent (ms)=170381420
>>                 Physical memory (bytes) snapshot=3704938954752
>>                 Virtual memory (bytes) snapshot=11455208308736
>>                 Total committed heap usage (bytes)=4523248582656
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=0
>>
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>>
>>
>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Already started the tests with only 8 containers for map in MR2, still
>>> running.
>>>
>>> But shouldn't YARN make better use of the memory? If I map YARN
>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Increasing the slowstart is not meant to increase performance, but
>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Changing mapreduce.job.reduce.
>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>> let map phase run faster?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Thanks Sandy.
>>>>>>
>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>> in MR1 both use the default value 0.05.
>>>>>>
>>>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>>>> but the changes did not give me a better result, thus, I changed it back to
>>>>>> 768MB.
>>>>>>
>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>> happens. BTW, I should use
>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>
>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>>
>>>>>> Thanks again.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>> phases will run separately.
>>>>>>>
>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>> happens.
>>>>>>>>>>>
>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>> instance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>> No lease on
>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>> any open files.
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>> --
>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>         at
>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>
>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count
>>>>>>>>>>>>>> = "60";
>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class
>>>>>>>>>>>>>> = "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Ok, I think I know where the problem is now. I used snappy to compress map
output.

Calculate the compression ratio = "Map output materialized bytes"/"Map
output bytes"

MR2
440519309909/1020000000000 = 0.431881676

MR1
240948272514/1000000000000 = 0.240948273


Seems something is wrong with the snappy in my Hadoop 2 and as a result,
MR2 needs to fetch much more data during shuffle phase. Any way to check
snappy in MR2?




On Wed, Oct 23, 2013 at 4:26 PM, Jian Fang <ji...@gmail.com>wrote:

> Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
> 4,294,967,296, but 10,000,000,000 for MR2.
> That is to say the number of unique keys fed into the reducers is
> 10000000000 in MR2, which is really wired. Any problem in the terasort code?
>
>
> On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Reducing the number of map containers to 8 slightly improve the total
>> time to 84 minutes. Here is output. Also, from the log, there is no clear
>> reason why the containers are killed other than the message such as
>> "Container killed by the
>> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
>> killed on request. Exit code is 143"
>>
>>
>> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=455484809066
>>                 FILE: Number of bytes written=896642344343
>>
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000841624
>>
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=25531
>>
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=150
>>
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=11
>>                 Launched map tasks=7449
>>                 Launched reduce tasks=86
>>                 Data-local map tasks=7434
>>                 Rack-local map tasks=15
>>                 Total time spent by all maps in occupied slots
>> (ms)=1030941232
>>                 Total time spent by all reduces in occupied slots
>> (ms)=1574732272
>>
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440519309909
>>                 Input split bytes=841624
>>
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440519309909
>>
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =558600
>>                 Failed Shuffles=0
>>                 Merged Map outputs=558600
>>                 GC time elapsed (ms)=2855534
>>                 CPU time spent (ms)=170381420
>>                 Physical memory (bytes) snapshot=3704938954752
>>                 Virtual memory (bytes) snapshot=11455208308736
>>                 Total committed heap usage (bytes)=4523248582656
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=0
>>
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>>
>>
>> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Already started the tests with only 8 containers for map in MR2, still
>>> running.
>>>
>>> But shouldn't YARN make better use of the memory? If I map YARN
>>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>>> real advantage of YARN then? I remember one goal of YARN is to solve the
>>> issue that map slots cannot be used for reduce and reduce slots cannot be
>>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>>
>>>
>>>
>>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Increasing the slowstart is not meant to increase performance, but
>>>> should make for a fairer comparison.  Have you tried making sure that in
>>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Changing mapreduce.job.reduce.
>>>>> slowstart.completedmaps to 0.99 does not look good. The map phase
>>>>> alone took 48 minutes and total time seems to be even longer. Any way to
>>>>> let map phase run faster?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Thanks Sandy.
>>>>>>
>>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>>> in MR1 both use the default value 0.05.
>>>>>>
>>>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>>>> but the changes did not give me a better result, thus, I changed it back to
>>>>>> 768MB.
>>>>>>
>>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>>> happens. BTW, I should use
>>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>>
>>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>>
>>>>>> Thanks again.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>>> phases will run separately.
>>>>>>>
>>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> How many map and reduce slots are you using per tasktracker in
>>>>>>>>> MR1?  How do the average map times compare? (MR2 reports this directly on
>>>>>>>>> the web UI, but you can also get a sense in MR1 by scrolling through the
>>>>>>>>> map tasks page).  Can you share the counters for MR1?
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>>> happens.
>>>>>>>>>>>
>>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>>> instance.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>>> No lease on
>>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>>> File does not exist. Holder
>>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>>> any open files.
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>>> --
>>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>>         at
>>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>>
>>>>>>>>>>>> -Sandy
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>>                 Total committed heap usage
>>>>>>>>>>>>> (bytes)=3636854259712
>>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count
>>>>>>>>>>>>>> = "60";
>>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class
>>>>>>>>>>>>>> = "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count =
>>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost
>>>>>>>>>>>>>> the same other configurations. In MR1, the 1TB terasort took about 45
>>>>>>>>>>>>>> minutes, but it took around 90 minutes in MR2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some
>>>>>>>>>>>>>> special configurations for terasort to work better in MR2?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn
>>>>>>>>>>>>>>> to execute a job?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as
>>>>>>>>>>>>>>> yarn uses container instead, right?
>>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'.
>>>>>>>>>>>>>>> But how to set the default size of physical mem of a container?
>>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a
>>>>>>>>>>>>>>> container? By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>> be 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a general question. You may use the same process
>>>>>>>>>>>>>>>> you took to
>>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old
>>>>>>>>>>>>>>>> properties specific
>>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will
>>>>>>>>>>>>>>>> be 22 containers due
>>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 31%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 32%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 33%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 35%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 40%
>>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
>>>>>>>>>>>>>>>> reduce 43%
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers
>>>>>>>>>>>>>>>> contact the Resource
>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning
>>>>>>>>>>>>>>>> on their
>>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
4,294,967,296, but 10,000,000,000 for MR2.
That is to say the number of unique keys fed into the reducers is
10000000000 in MR2, which is really wired. Any problem in the terasort code?


On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <ji...@gmail.com>wrote:

> Reducing the number of map containers to 8 slightly improve the total time
> to 84 minutes. Here is output. Also, from the log, there is no clear reason
> why the containers are killed other than the message such as "Container
> killed by the
> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
> killed on request. Exit code is 143"
>
>
> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=455484809066
>                 FILE: Number of bytes written=896642344343
>
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000841624
>
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=25531
>
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=150
>
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=11
>                 Launched map tasks=7449
>                 Launched reduce tasks=86
>                 Data-local map tasks=7434
>                 Rack-local map tasks=15
>                 Total time spent by all maps in occupied slots
> (ms)=1030941232
>                 Total time spent by all reduces in occupied slots
> (ms)=1574732272
>
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440519309909
>                 Input split bytes=841624
>
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440519309909
>
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =558600
>                 Failed Shuffles=0
>                 Merged Map outputs=558600
>                 GC time elapsed (ms)=2855534
>                 CPU time spent (ms)=170381420
>                 Physical memory (bytes) snapshot=3704938954752
>                 Virtual memory (bytes) snapshot=11455208308736
>                 Total committed heap usage (bytes)=4523248582656
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=0
>
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
>
>
> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Already started the tests with only 8 containers for map in MR2, still
>> running.
>>
>> But shouldn't YARN make better use of the memory? If I map YARN
>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>> real advantage of YARN then? I remember one goal of YARN is to solve the
>> issue that map slots cannot be used for reduce and reduce slots cannot be
>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>
>>
>>
>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Increasing the slowstart is not meant to increase performance, but
>>> should make for a fairer comparison.  Have you tried making sure that in
>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Changing mapreduce.job.reduce.
>>>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>>>> took 48 minutes and total time seems to be even longer. Any way to let map
>>>> phase run faster?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Thanks Sandy.
>>>>>
>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>> in MR1 both use the default value 0.05.
>>>>>
>>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>>> but the changes did not give me a better result, thus, I changed it back to
>>>>> 768MB.
>>>>>
>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>> happens. BTW, I should use
>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>
>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>> phases will run separately.
>>>>>>
>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>> happens.
>>>>>>>>>>
>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>> instance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>> No lease on
>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>> File does not exist. Holder
>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>> any open files.
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>> --
>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>         at
>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>>>> "60";
>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>
>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class
>>>>>>>>>>>>> = "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>
>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'. But
>>>>>>>>>>>>>> how to set the default size of physical mem of a container?
>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>>>> took to
>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 31%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 32%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 33%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 35%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 40%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 43%
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
4,294,967,296, but 10,000,000,000 for MR2.
That is to say the number of unique keys fed into the reducers is
10000000000 in MR2, which is really wired. Any problem in the terasort code?


On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <ji...@gmail.com>wrote:

> Reducing the number of map containers to 8 slightly improve the total time
> to 84 minutes. Here is output. Also, from the log, there is no clear reason
> why the containers are killed other than the message such as "Container
> killed by the
> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
> killed on request. Exit code is 143"
>
>
> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=455484809066
>                 FILE: Number of bytes written=896642344343
>
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000841624
>
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=25531
>
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=150
>
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=11
>                 Launched map tasks=7449
>                 Launched reduce tasks=86
>                 Data-local map tasks=7434
>                 Rack-local map tasks=15
>                 Total time spent by all maps in occupied slots
> (ms)=1030941232
>                 Total time spent by all reduces in occupied slots
> (ms)=1574732272
>
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440519309909
>                 Input split bytes=841624
>
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440519309909
>
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =558600
>                 Failed Shuffles=0
>                 Merged Map outputs=558600
>                 GC time elapsed (ms)=2855534
>                 CPU time spent (ms)=170381420
>                 Physical memory (bytes) snapshot=3704938954752
>                 Virtual memory (bytes) snapshot=11455208308736
>                 Total committed heap usage (bytes)=4523248582656
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=0
>
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
>
>
> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Already started the tests with only 8 containers for map in MR2, still
>> running.
>>
>> But shouldn't YARN make better use of the memory? If I map YARN
>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>> real advantage of YARN then? I remember one goal of YARN is to solve the
>> issue that map slots cannot be used for reduce and reduce slots cannot be
>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>
>>
>>
>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Increasing the slowstart is not meant to increase performance, but
>>> should make for a fairer comparison.  Have you tried making sure that in
>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Changing mapreduce.job.reduce.
>>>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>>>> took 48 minutes and total time seems to be even longer. Any way to let map
>>>> phase run faster?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Thanks Sandy.
>>>>>
>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>> in MR1 both use the default value 0.05.
>>>>>
>>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>>> but the changes did not give me a better result, thus, I changed it back to
>>>>> 768MB.
>>>>>
>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>> happens. BTW, I should use
>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>
>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>> phases will run separately.
>>>>>>
>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>> happens.
>>>>>>>>>>
>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>> instance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>> No lease on
>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>> File does not exist. Holder
>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>> any open files.
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>> --
>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>         at
>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>>>> "60";
>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>
>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class
>>>>>>>>>>>>> = "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>
>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'. But
>>>>>>>>>>>>>> how to set the default size of physical mem of a container?
>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>>>> took to
>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 31%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 32%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 33%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 35%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 40%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 43%
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
4,294,967,296, but 10,000,000,000 for MR2.
That is to say the number of unique keys fed into the reducers is
10000000000 in MR2, which is really wired. Any problem in the terasort code?


On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <ji...@gmail.com>wrote:

> Reducing the number of map containers to 8 slightly improve the total time
> to 84 minutes. Here is output. Also, from the log, there is no clear reason
> why the containers are killed other than the message such as "Container
> killed by the
> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
> killed on request. Exit code is 143"
>
>
> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=455484809066
>                 FILE: Number of bytes written=896642344343
>
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000841624
>
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=25531
>
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=150
>
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=11
>                 Launched map tasks=7449
>                 Launched reduce tasks=86
>                 Data-local map tasks=7434
>                 Rack-local map tasks=15
>                 Total time spent by all maps in occupied slots
> (ms)=1030941232
>                 Total time spent by all reduces in occupied slots
> (ms)=1574732272
>
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440519309909
>                 Input split bytes=841624
>
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440519309909
>
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =558600
>                 Failed Shuffles=0
>                 Merged Map outputs=558600
>                 GC time elapsed (ms)=2855534
>                 CPU time spent (ms)=170381420
>                 Physical memory (bytes) snapshot=3704938954752
>                 Virtual memory (bytes) snapshot=11455208308736
>                 Total committed heap usage (bytes)=4523248582656
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=0
>
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
>
>
> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Already started the tests with only 8 containers for map in MR2, still
>> running.
>>
>> But shouldn't YARN make better use of the memory? If I map YARN
>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>> real advantage of YARN then? I remember one goal of YARN is to solve the
>> issue that map slots cannot be used for reduce and reduce slots cannot be
>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>
>>
>>
>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Increasing the slowstart is not meant to increase performance, but
>>> should make for a fairer comparison.  Have you tried making sure that in
>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Changing mapreduce.job.reduce.
>>>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>>>> took 48 minutes and total time seems to be even longer. Any way to let map
>>>> phase run faster?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Thanks Sandy.
>>>>>
>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>> in MR1 both use the default value 0.05.
>>>>>
>>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>>> but the changes did not give me a better result, thus, I changed it back to
>>>>> 768MB.
>>>>>
>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>> happens. BTW, I should use
>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>
>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>> phases will run separately.
>>>>>>
>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>> happens.
>>>>>>>>>>
>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>> instance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>> No lease on
>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>> File does not exist. Holder
>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>> any open files.
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>> --
>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>         at
>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>>>> "60";
>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>
>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class
>>>>>>>>>>>>> = "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>
>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'. But
>>>>>>>>>>>>>> how to set the default size of physical mem of a container?
>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>>>> took to
>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 31%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 32%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 33%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 35%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 40%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 43%
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Looked at the results for MR1 and MR2. Reduce input groups for MR1 is
4,294,967,296, but 10,000,000,000 for MR2.
That is to say the number of unique keys fed into the reducers is
10000000000 in MR2, which is really wired. Any problem in the terasort code?


On Wed, Oct 23, 2013 at 2:16 PM, Jian Fang <ji...@gmail.com>wrote:

> Reducing the number of map containers to 8 slightly improve the total time
> to 84 minutes. Here is output. Also, from the log, there is no clear reason
> why the containers are killed other than the message such as "Container
> killed by the
> ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
> killed on request. Exit code is 143"
>
>
> 2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=455484809066
>                 FILE: Number of bytes written=896642344343
>
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000841624
>
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=25531
>
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=150
>
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=11
>                 Launched map tasks=7449
>                 Launched reduce tasks=86
>                 Data-local map tasks=7434
>                 Rack-local map tasks=15
>                 Total time spent by all maps in occupied slots
> (ms)=1030941232
>                 Total time spent by all reduces in occupied slots
> (ms)=1574732272
>
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440519309909
>                 Input split bytes=841624
>
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440519309909
>
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =558600
>                 Failed Shuffles=0
>                 Merged Map outputs=558600
>                 GC time elapsed (ms)=2855534
>                 CPU time spent (ms)=170381420
>                 Physical memory (bytes) snapshot=3704938954752
>                 Virtual memory (bytes) snapshot=11455208308736
>                 Total committed heap usage (bytes)=4523248582656
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=0
>
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
>
>
> On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Already started the tests with only 8 containers for map in MR2, still
>> running.
>>
>> But shouldn't YARN make better use of the memory? If I map YARN
>> containers in MR2 to exact 8 map and 3 reduce slots in MR1, what is the
>> real advantage of YARN then? I remember one goal of YARN is to solve the
>> issue that map slots cannot be used for reduce and reduce slots cannot be
>> used for map. Shouldn't YARN be smart enough to handle the concurrent tasks?
>>
>>
>>
>> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Increasing the slowstart is not meant to increase performance, but
>>> should make for a fairer comparison.  Have you tried making sure that in
>>> MR2 only 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Changing mapreduce.job.reduce.
>>>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>>>> took 48 minutes and total time seems to be even longer. Any way to let map
>>>> phase run faster?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Thanks Sandy.
>>>>>
>>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>>> in MR1 both use the default value 0.05.
>>>>>
>>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>>> but the changes did not give me a better result, thus, I changed it back to
>>>>> 768MB.
>>>>>
>>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>>> happens. BTW, I should use
>>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>>
>>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>>> phases will run separately.
>>>>>>
>>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>>> in configuration or code. Any hints?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>>> happens.
>>>>>>>>>>
>>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>>> instance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>>> No lease on
>>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>>> File does not exist. Holder
>>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>>> any open files.
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>>> --
>>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>>         at
>>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you
>>>>>>>>>>> know why?  Also, MR2 doesn't have JVM reuse, so it might make sense to
>>>>>>>>>>> compare it to MR1 with JVM reuse turned off.
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>>
>>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>>         File System Counters
>>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>>         Job Counters
>>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>>>> "60";
>>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>>
>>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>>> "64";
>>>>>>>>>>>>>
>>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class
>>>>>>>>>>>>> = "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>>
>>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and
>>>>>>>>>>>>>> reduce task together, with a typical process(map > shuffle > reduce). In
>>>>>>>>>>>>>> Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'. But
>>>>>>>>>>>>>> how to set the default size of physical mem of a container?
>>>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>> to be assigned to containers?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>>>> took to
>>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper
>>>>>>>>>>>>>>> to be assigned to
>>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely
>>>>>>>>>>>>>>> curious about what's
>>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 31%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 32%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 33%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 35%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 40%
>>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>>> 43%
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource
>>>>>>>>>>>>>>> Manager. </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them
>>>>>>>>>>>>>>> and do comparison
>>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but
>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much
>>>>>>>>>>>>>>> worse on Reduce
>>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn
>>>>>>>>>>>>>>> performance? Is any
>>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Reducing the number of map containers to 8 slightly improve the total time
to 84 minutes. Here is output. Also, from the log, there is no clear reason
why the containers are killed other than the message such as "Container
killed by the
ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
killed on request. Exit code is 143"


2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
Counters: 46
        File System Counters
                FILE: Number of bytes read=455484809066
                FILE: Number of bytes written=896642344343
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000841624
                HDFS: Number of bytes written=1000000000000
                HDFS: Number of read operations=25531
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=150
        Job Counters
                Killed map tasks=1
                Killed reduce tasks=11
                Launched map tasks=7449
                Launched reduce tasks=86
                Data-local map tasks=7434
                Rack-local map tasks=15
                Total time spent by all maps in occupied slots
(ms)=1030941232
                Total time spent by all reduces in occupied slots
(ms)=1574732272
        Map-Reduce Framework
                Map input records=10000000000
                Map output records=10000000000
                Map output bytes=1020000000000
                Map output materialized bytes=440519309909
                Input split bytes=841624
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000000
                Reduce shuffle bytes=440519309909
                Reduce input records=10000000000
                Reduce output records=10000000000
                Spilled Records=20000000000
                Shuffled Maps =558600
                Failed Shuffles=0
                Merged Map outputs=558600
                GC time elapsed (ms)=2855534
                CPU time spent (ms)=170381420
                Physical memory (bytes) snapshot=3704938954752
                Virtual memory (bytes) snapshot=11455208308736
                Total committed heap usage (bytes)=4523248582656
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1000000000000
        File Output Format Counters
                Bytes Written=1000000000000



On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <ji...@gmail.com>wrote:

> Already started the tests with only 8 containers for map in MR2, still
> running.
>
> But shouldn't YARN make better use of the memory? If I map YARN containers
> in MR2 to exact 8 map and 3 reduce slots in MR1, what is the real advantage
> of YARN then? I remember one goal of YARN is to solve the issue that map
> slots cannot be used for reduce and reduce slots cannot be used for map.
> Shouldn't YARN be smart enough to handle the concurrent tasks?
>
>
>
> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Increasing the slowstart is not meant to increase performance, but should
>> make for a fairer comparison.  Have you tried making sure that in MR2 only
>> 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Changing mapreduce.job.reduce.
>>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>>> took 48 minutes and total time seems to be even longer. Any way to let map
>>> phase run faster?
>>>
>>> Thanks.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Thanks Sandy.
>>>>
>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>> in MR1 both use the default value 0.05.
>>>>
>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>> but the changes did not give me a better result, thus, I changed it back to
>>>> 768MB.
>>>>
>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>> happens. BTW, I should use
>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>
>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>
>>>> Thanks again.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>> phases will run separately.
>>>>>
>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>> in configuration or code. Any hints?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>> instance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>> No lease on
>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>> File does not exist. Holder
>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>> any open files.
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>> --
>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>         at
>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>         File System Counters
>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>         Job Counters
>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>
>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>
>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>>> "60";
>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>
>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>> "64";
>>>>>>>>>>>>
>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>
>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'. But
>>>>>>>>>>>>> how to set the default size of physical mem of a container?
>>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>>> took to
>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>>> be assigned to
>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>>>> about what's
>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 31%
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 32%
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 33%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 35%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 40%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 43%
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>>>> </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>>>> do comparison
>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>>> their
>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>>>> Is any
>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Reducing the number of map containers to 8 slightly improve the total time
to 84 minutes. Here is output. Also, from the log, there is no clear reason
why the containers are killed other than the message such as "Container
killed by the
ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
killed on request. Exit code is 143"


2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
Counters: 46
        File System Counters
                FILE: Number of bytes read=455484809066
                FILE: Number of bytes written=896642344343
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000841624
                HDFS: Number of bytes written=1000000000000
                HDFS: Number of read operations=25531
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=150
        Job Counters
                Killed map tasks=1
                Killed reduce tasks=11
                Launched map tasks=7449
                Launched reduce tasks=86
                Data-local map tasks=7434
                Rack-local map tasks=15
                Total time spent by all maps in occupied slots
(ms)=1030941232
                Total time spent by all reduces in occupied slots
(ms)=1574732272
        Map-Reduce Framework
                Map input records=10000000000
                Map output records=10000000000
                Map output bytes=1020000000000
                Map output materialized bytes=440519309909
                Input split bytes=841624
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000000
                Reduce shuffle bytes=440519309909
                Reduce input records=10000000000
                Reduce output records=10000000000
                Spilled Records=20000000000
                Shuffled Maps =558600
                Failed Shuffles=0
                Merged Map outputs=558600
                GC time elapsed (ms)=2855534
                CPU time spent (ms)=170381420
                Physical memory (bytes) snapshot=3704938954752
                Virtual memory (bytes) snapshot=11455208308736
                Total committed heap usage (bytes)=4523248582656
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1000000000000
        File Output Format Counters
                Bytes Written=1000000000000



On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <ji...@gmail.com>wrote:

> Already started the tests with only 8 containers for map in MR2, still
> running.
>
> But shouldn't YARN make better use of the memory? If I map YARN containers
> in MR2 to exact 8 map and 3 reduce slots in MR1, what is the real advantage
> of YARN then? I remember one goal of YARN is to solve the issue that map
> slots cannot be used for reduce and reduce slots cannot be used for map.
> Shouldn't YARN be smart enough to handle the concurrent tasks?
>
>
>
> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Increasing the slowstart is not meant to increase performance, but should
>> make for a fairer comparison.  Have you tried making sure that in MR2 only
>> 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Changing mapreduce.job.reduce.
>>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>>> took 48 minutes and total time seems to be even longer. Any way to let map
>>> phase run faster?
>>>
>>> Thanks.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Thanks Sandy.
>>>>
>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>> in MR1 both use the default value 0.05.
>>>>
>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>> but the changes did not give me a better result, thus, I changed it back to
>>>> 768MB.
>>>>
>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>> happens. BTW, I should use
>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>
>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>
>>>> Thanks again.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>> phases will run separately.
>>>>>
>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>> in configuration or code. Any hints?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>> instance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>> No lease on
>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>> File does not exist. Holder
>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>> any open files.
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>> --
>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>         at
>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>         File System Counters
>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>         Job Counters
>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>
>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>
>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>>> "60";
>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>
>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>> "64";
>>>>>>>>>>>>
>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>
>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'. But
>>>>>>>>>>>>> how to set the default size of physical mem of a container?
>>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>>> took to
>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>>> be assigned to
>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>>>> about what's
>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 31%
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 32%
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 33%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 35%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 40%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 43%
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>>>> </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>>>> do comparison
>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>>> their
>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>>>> Is any
>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Reducing the number of map containers to 8 slightly improve the total time
to 84 minutes. Here is output. Also, from the log, there is no clear reason
why the containers are killed other than the message such as "Container
killed by the
ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
killed on request. Exit code is 143"


2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
Counters: 46
        File System Counters
                FILE: Number of bytes read=455484809066
                FILE: Number of bytes written=896642344343
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000841624
                HDFS: Number of bytes written=1000000000000
                HDFS: Number of read operations=25531
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=150
        Job Counters
                Killed map tasks=1
                Killed reduce tasks=11
                Launched map tasks=7449
                Launched reduce tasks=86
                Data-local map tasks=7434
                Rack-local map tasks=15
                Total time spent by all maps in occupied slots
(ms)=1030941232
                Total time spent by all reduces in occupied slots
(ms)=1574732272
        Map-Reduce Framework
                Map input records=10000000000
                Map output records=10000000000
                Map output bytes=1020000000000
                Map output materialized bytes=440519309909
                Input split bytes=841624
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000000
                Reduce shuffle bytes=440519309909
                Reduce input records=10000000000
                Reduce output records=10000000000
                Spilled Records=20000000000
                Shuffled Maps =558600
                Failed Shuffles=0
                Merged Map outputs=558600
                GC time elapsed (ms)=2855534
                CPU time spent (ms)=170381420
                Physical memory (bytes) snapshot=3704938954752
                Virtual memory (bytes) snapshot=11455208308736
                Total committed heap usage (bytes)=4523248582656
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1000000000000
        File Output Format Counters
                Bytes Written=1000000000000



On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <ji...@gmail.com>wrote:

> Already started the tests with only 8 containers for map in MR2, still
> running.
>
> But shouldn't YARN make better use of the memory? If I map YARN containers
> in MR2 to exact 8 map and 3 reduce slots in MR1, what is the real advantage
> of YARN then? I remember one goal of YARN is to solve the issue that map
> slots cannot be used for reduce and reduce slots cannot be used for map.
> Shouldn't YARN be smart enough to handle the concurrent tasks?
>
>
>
> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Increasing the slowstart is not meant to increase performance, but should
>> make for a fairer comparison.  Have you tried making sure that in MR2 only
>> 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Changing mapreduce.job.reduce.
>>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>>> took 48 minutes and total time seems to be even longer. Any way to let map
>>> phase run faster?
>>>
>>> Thanks.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Thanks Sandy.
>>>>
>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>> in MR1 both use the default value 0.05.
>>>>
>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>> but the changes did not give me a better result, thus, I changed it back to
>>>> 768MB.
>>>>
>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>> happens. BTW, I should use
>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>
>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>
>>>> Thanks again.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>> phases will run separately.
>>>>>
>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>> in configuration or code. Any hints?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>> instance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>> No lease on
>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>> File does not exist. Holder
>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>> any open files.
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>> --
>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>         at
>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>         File System Counters
>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>         Job Counters
>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>
>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>
>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>>> "60";
>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>
>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>> "64";
>>>>>>>>>>>>
>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>
>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'. But
>>>>>>>>>>>>> how to set the default size of physical mem of a container?
>>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>>> took to
>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>>> be assigned to
>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>>>> about what's
>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 31%
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 32%
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 33%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 35%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 40%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 43%
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>>>> </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>>>> do comparison
>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>>> their
>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>>>> Is any
>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Reducing the number of map containers to 8 slightly improve the total time
to 84 minutes. Here is output. Also, from the log, there is no clear reason
why the containers are killed other than the message such as "Container
killed by the
ApplicationMaster../container_1382237301855_0001_01_000001/syslog:Container
killed on request. Exit code is 143"


2013-10-23 21:09:34,325 INFO org.apache.hadoop.mapreduce.Job (main):
Counters: 46
        File System Counters
                FILE: Number of bytes read=455484809066
                FILE: Number of bytes written=896642344343
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000841624
                HDFS: Number of bytes written=1000000000000
                HDFS: Number of read operations=25531
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=150
        Job Counters
                Killed map tasks=1
                Killed reduce tasks=11
                Launched map tasks=7449
                Launched reduce tasks=86
                Data-local map tasks=7434
                Rack-local map tasks=15
                Total time spent by all maps in occupied slots
(ms)=1030941232
                Total time spent by all reduces in occupied slots
(ms)=1574732272
        Map-Reduce Framework
                Map input records=10000000000
                Map output records=10000000000
                Map output bytes=1020000000000
                Map output materialized bytes=440519309909
                Input split bytes=841624
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000000
                Reduce shuffle bytes=440519309909
                Reduce input records=10000000000
                Reduce output records=10000000000
                Spilled Records=20000000000
                Shuffled Maps =558600
                Failed Shuffles=0
                Merged Map outputs=558600
                GC time elapsed (ms)=2855534
                CPU time spent (ms)=170381420
                Physical memory (bytes) snapshot=3704938954752
                Virtual memory (bytes) snapshot=11455208308736
                Total committed heap usage (bytes)=4523248582656
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1000000000000
        File Output Format Counters
                Bytes Written=1000000000000



On Wed, Oct 23, 2013 at 1:46 PM, Jian Fang <ji...@gmail.com>wrote:

> Already started the tests with only 8 containers for map in MR2, still
> running.
>
> But shouldn't YARN make better use of the memory? If I map YARN containers
> in MR2 to exact 8 map and 3 reduce slots in MR1, what is the real advantage
> of YARN then? I remember one goal of YARN is to solve the issue that map
> slots cannot be used for reduce and reduce slots cannot be used for map.
> Shouldn't YARN be smart enough to handle the concurrent tasks?
>
>
>
> On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Increasing the slowstart is not meant to increase performance, but should
>> make for a fairer comparison.  Have you tried making sure that in MR2 only
>> 8 map tasks are running concurrently, or boosting MR1 up to 16?
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Changing mapreduce.job.reduce.
>>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>>> took 48 minutes and total time seems to be even longer. Any way to let map
>>> phase run faster?
>>>
>>> Thanks.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Thanks Sandy.
>>>>
>>>> io.sort.record.percent is the default value 0.05 for both MR1 and
>>>> MR2.  mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>>> in MR1 both use the default value 0.05.
>>>>
>>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>>> but the changes did not give me a better result, thus, I changed it back to
>>>> 768MB.
>>>>
>>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>>> happens. BTW, I should use
>>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>>
>>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>>
>>>> Thanks again.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>>> if you are oversubscribing your node's resources.  Because of the different
>>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>>> phases will run separately.
>>>>>
>>>>> On the other side, it looks like your MR1 has more spilled records
>>>>> than MR2.  For a fairer comparison, you should set io.sort.record.percent
>>>>> in MR1 to .13, which should improve MR1 performance, but will provide a
>>>>> fairer comparison (MR2 automatically does this tuning for you).
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> The number of map slots and reduce slots on each data node for MR1
>>>>>> are 8 and 3, respectively. Since MR2 could use all containers for either
>>>>>> map or reduce, I would expect that MR2 is faster.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>>> in configuration or code. Any hints?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what
>>>>>>>>> happens.
>>>>>>>>>
>>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For
>>>>>>>>> instance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>>> No lease on
>>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>>> File does not exist. Holder
>>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>>> any open files.
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>>> --
>>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>>         at
>>>>>>>>> java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>>         at
>>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>
>>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>>>
>>>>>>>>>> -Sandy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>>
>>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>>         File System Counters
>>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>>         Job Counters
>>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>>                 Total time spent by all reduces in occupied
>>>>>>>>>>> slots (ms)=2664045096
>>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>>
>>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>>
>>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>>> "60";
>>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>>
>>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>>
>>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>>> "64";
>>>>>>>>>>>>
>>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>>
>>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Sam,
>>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a
>>>>>>>>>>>>> NodeManager using 'yarn.nodemanager.resource.memory-mb'. But
>>>>>>>>>>>>> how to set the default size of physical mem of a container?
>>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>>> for Map
>>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM
>>>>>>>>>>>>>> container that
>>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>>> properties
>>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>>> took to
>>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>>> Every
>>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>>> have there
>>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>>> change), or
>>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>>> (config
>>>>>>>>>>>>>> change).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>>> the config
>>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > According to above suggestions, I removed the duplication
>>>>>>>>>>>>>> of setting, and
>>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>>> be assigned to
>>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
>>>>>>>>>>>>>> The config
>>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total
>>>>>>>>>>>>>> containers (as
>>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>>> the same for
>>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>>>> about what's
>>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
>>>>>>>>>>>>>> to the default
>>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent
>>>>>>>>>>>>>> about 5.5 mins from
>>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 31%
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 32%
>>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 33%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 35%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 40%
>>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>>> 43%
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> Any way, below are my configurations for your
>>>>>>>>>>>>>> reference. Thanks!
>>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>>> manager and
>>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to
>>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>>>> </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>>> >> >>     <description>the amount of memory on the
>>>>>>>>>>>>>> NodeManager in
>>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the
>>>>>>>>>>>>>> application logs are moved
>>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers
>>>>>>>>>>>>>> as log
>>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1
>>>>>>>>>>>>>> GB minimum by
>>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8
>>>>>>>>>>>>>> (due to 8 GB NM
>>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share
>>>>>>>>>>>>>> your configs as at
>>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast
>>>>>>>>>>>>>> comparision of MRv1 and
>>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>>> results. After
>>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>>>> do comparison
>>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I
>>>>>>>>>>>>>> think, to compare
>>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not
>>>>>>>>>>>>>> in the near
>>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster
>>>>>>>>>>>>>> are set on the
>>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>>> their
>>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a
>>>>>>>>>>>>>> better framework than
>>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>>>> Is any
>>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>>> >> >
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> --
>>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Already started the tests with only 8 containers for map in MR2, still
running.

But shouldn't YARN make better use of the memory? If I map YARN containers
in MR2 to exact 8 map and 3 reduce slots in MR1, what is the real advantage
of YARN then? I remember one goal of YARN is to solve the issue that map
slots cannot be used for reduce and reduce slots cannot be used for map.
Shouldn't YARN be smart enough to handle the concurrent tasks?



On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Increasing the slowstart is not meant to increase performance, but should
> make for a fairer comparison.  Have you tried making sure that in MR2 only
> 8 map tasks are running concurrently, or boosting MR1 up to 16?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Changing mapreduce.job.reduce.
>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>> took 48 minutes and total time seems to be even longer. Any way to let map
>> phase run faster?
>>
>> Thanks.
>>
>>
>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Thanks Sandy.
>>>
>>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>> in MR1 both use the default value 0.05.
>>>
>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>> but the changes did not give me a better result, thus, I changed it back to
>>> 768MB.
>>>
>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>> happens. BTW, I should use
>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>
>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>
>>> Thanks again.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>> if you are oversubscribing your node's resources.  Because of the different
>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>> phases will run separately.
>>>>
>>>> On the other side, it looks like your MR1 has more spilled records than
>>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>>> to .13, which should improve MR1 performance, but will provide a fairer
>>>> comparison (MR2 automatically does this tuning for you).
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>>> reduce, I would expect that MR2 is faster.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>> in configuration or code. Any hints?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>>
>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>> No lease on
>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>> File does not exist. Holder
>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>> any open files.
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>> --
>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>
>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>         File System Counters
>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>         Job Counters
>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>>> (ms)=2664045096
>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>
>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>
>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>> "60";
>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>
>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>> "64";
>>>>>>>>>>>
>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>
>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>
>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sam,
>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>
>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>
>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set
>>>>>>>>>>>> the default size of physical mem of a container?
>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>
>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>> for Map
>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>>> that
>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>> properties
>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>> took to
>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>> Every
>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>> have there
>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>> change), or
>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>> (config
>>>>>>>>>>>>> change).
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>> specific
>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>> the config
>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>>> setting, and
>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>> be assigned to
>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>>> config
>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>>> (as
>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>> memory
>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>> the same for
>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>>> about what's
>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>>> the default
>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 31%
>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 32%
>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 33%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 35%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 40%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 43%
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>> manager and
>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>>> Resource
>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>>> </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>>> in
>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>>> logs are moved
>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>>> log
>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>>> minimum by
>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>>> configs as at
>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>> results. After
>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>>> do comparison
>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>>> to compare
>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>>> the near
>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>>> set on the
>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>> their
>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>>> framework than
>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>>> Is any
>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> --
>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Already started the tests with only 8 containers for map in MR2, still
running.

But shouldn't YARN make better use of the memory? If I map YARN containers
in MR2 to exact 8 map and 3 reduce slots in MR1, what is the real advantage
of YARN then? I remember one goal of YARN is to solve the issue that map
slots cannot be used for reduce and reduce slots cannot be used for map.
Shouldn't YARN be smart enough to handle the concurrent tasks?



On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Increasing the slowstart is not meant to increase performance, but should
> make for a fairer comparison.  Have you tried making sure that in MR2 only
> 8 map tasks are running concurrently, or boosting MR1 up to 16?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Changing mapreduce.job.reduce.
>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>> took 48 minutes and total time seems to be even longer. Any way to let map
>> phase run faster?
>>
>> Thanks.
>>
>>
>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Thanks Sandy.
>>>
>>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>> in MR1 both use the default value 0.05.
>>>
>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>> but the changes did not give me a better result, thus, I changed it back to
>>> 768MB.
>>>
>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>> happens. BTW, I should use
>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>
>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>
>>> Thanks again.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>> if you are oversubscribing your node's resources.  Because of the different
>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>> phases will run separately.
>>>>
>>>> On the other side, it looks like your MR1 has more spilled records than
>>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>>> to .13, which should improve MR1 performance, but will provide a fairer
>>>> comparison (MR2 automatically does this tuning for you).
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>>> reduce, I would expect that MR2 is faster.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>> in configuration or code. Any hints?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>>
>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>> No lease on
>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>> File does not exist. Holder
>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>> any open files.
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>> --
>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>
>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>         File System Counters
>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>         Job Counters
>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>>> (ms)=2664045096
>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>
>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>
>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>> "60";
>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>
>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>> "64";
>>>>>>>>>>>
>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>
>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>
>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sam,
>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>
>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>
>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set
>>>>>>>>>>>> the default size of physical mem of a container?
>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>
>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>> for Map
>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>>> that
>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>> properties
>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>> took to
>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>> Every
>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>> have there
>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>> change), or
>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>> (config
>>>>>>>>>>>>> change).
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>> specific
>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>> the config
>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>>> setting, and
>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>> be assigned to
>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>>> config
>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>>> (as
>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>> memory
>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>> the same for
>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>>> about what's
>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>>> the default
>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 31%
>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 32%
>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 33%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 35%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 40%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 43%
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>> manager and
>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>>> Resource
>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>>> </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>>> in
>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>>> logs are moved
>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>>> log
>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>>> minimum by
>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>>> configs as at
>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>> results. After
>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>>> do comparison
>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>>> to compare
>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>>> the near
>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>>> set on the
>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>> their
>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>>> framework than
>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>>> Is any
>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> --
>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Already started the tests with only 8 containers for map in MR2, still
running.

But shouldn't YARN make better use of the memory? If I map YARN containers
in MR2 to exact 8 map and 3 reduce slots in MR1, what is the real advantage
of YARN then? I remember one goal of YARN is to solve the issue that map
slots cannot be used for reduce and reduce slots cannot be used for map.
Shouldn't YARN be smart enough to handle the concurrent tasks?



On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Increasing the slowstart is not meant to increase performance, but should
> make for a fairer comparison.  Have you tried making sure that in MR2 only
> 8 map tasks are running concurrently, or boosting MR1 up to 16?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Changing mapreduce.job.reduce.
>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>> took 48 minutes and total time seems to be even longer. Any way to let map
>> phase run faster?
>>
>> Thanks.
>>
>>
>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Thanks Sandy.
>>>
>>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>> in MR1 both use the default value 0.05.
>>>
>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>> but the changes did not give me a better result, thus, I changed it back to
>>> 768MB.
>>>
>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>> happens. BTW, I should use
>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>
>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>
>>> Thanks again.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>> if you are oversubscribing your node's resources.  Because of the different
>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>> phases will run separately.
>>>>
>>>> On the other side, it looks like your MR1 has more spilled records than
>>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>>> to .13, which should improve MR1 performance, but will provide a fairer
>>>> comparison (MR2 automatically does this tuning for you).
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>>> reduce, I would expect that MR2 is faster.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>> in configuration or code. Any hints?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>>
>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>> No lease on
>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>> File does not exist. Holder
>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>> any open files.
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>> --
>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>
>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>         File System Counters
>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>         Job Counters
>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>>> (ms)=2664045096
>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>
>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>
>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>> "60";
>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>
>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>> "64";
>>>>>>>>>>>
>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>
>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>
>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sam,
>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>
>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>
>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set
>>>>>>>>>>>> the default size of physical mem of a container?
>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>
>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>> for Map
>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>>> that
>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>> properties
>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>> took to
>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>> Every
>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>> have there
>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>> change), or
>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>> (config
>>>>>>>>>>>>> change).
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>> specific
>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>> the config
>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>>> setting, and
>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>> be assigned to
>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>>> config
>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>>> (as
>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>> memory
>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>> the same for
>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>>> about what's
>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>>> the default
>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 31%
>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 32%
>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 33%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 35%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 40%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 43%
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>> manager and
>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>>> Resource
>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>>> </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>>> in
>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>>> logs are moved
>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>>> log
>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>>> minimum by
>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>>> configs as at
>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>> results. After
>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>>> do comparison
>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>>> to compare
>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>>> the near
>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>>> set on the
>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>> their
>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>>> framework than
>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>>> Is any
>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> --
>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Already started the tests with only 8 containers for map in MR2, still
running.

But shouldn't YARN make better use of the memory? If I map YARN containers
in MR2 to exact 8 map and 3 reduce slots in MR1, what is the real advantage
of YARN then? I remember one goal of YARN is to solve the issue that map
slots cannot be used for reduce and reduce slots cannot be used for map.
Shouldn't YARN be smart enough to handle the concurrent tasks?



On Wed, Oct 23, 2013 at 1:17 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Increasing the slowstart is not meant to increase performance, but should
> make for a fairer comparison.  Have you tried making sure that in MR2 only
> 8 map tasks are running concurrently, or boosting MR1 up to 16?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Changing mapreduce.job.reduce.
>> slowstart.completedmaps to 0.99 does not look good. The map phase alone
>> took 48 minutes and total time seems to be even longer. Any way to let map
>> phase run faster?
>>
>> Thanks.
>>
>>
>> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Thanks Sandy.
>>>
>>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>>> in MR1 both use the default value 0.05.
>>>
>>> I tried to allocate 1536MB and 1024MB to map container some time ago,
>>> but the changes did not give me a better result, thus, I changed it back to
>>> 768MB.
>>>
>>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what
>>> happens. BTW, I should use
>>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>>
>>> Also, in MR1 I can specify tasktracker.http.threads, but I could not
>>> find the counterpart for MR2. Which one I should tune for the http thread?
>>>
>>> Thanks again.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>>> about three times as long in MR2 as they are in MR1.  This is probably
>>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>>> if you are oversubscribing your node's resources.  Because of the different
>>>> way that MR1 and MR2 view resources, I think it's better to test with
>>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>>> phases will run separately.
>>>>
>>>> On the other side, it looks like your MR1 has more spilled records than
>>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>>> to .13, which should improve MR1 performance, but will provide a fairer
>>>> comparison (MR2 automatically does this tuning for you).
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>>> reduce, I would expect that MR2 is faster.
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>>> tasks page).  Can you share the counters for MR1?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Unfortunately, turning off JVM reuse still got the same result,
>>>>>>> i.e., about 90 minutes for MR2. I don't think the killed reduces could
>>>>>>> contribute to 2 times slowness. There should be something very wrong either
>>>>>>> in configuration or code. Any hints?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>>
>>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>>> No lease on
>>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>>> File does not exist. Holder
>>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>>> any open files.
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>>> --
>>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>>         at
>>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>>
>>>>>>>>> -Sandy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>>
>>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>>> (main): Counters: 46
>>>>>>>>>>         File System Counters
>>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>>         Job Counters
>>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>>> (ms)=1696141311
>>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>>> (ms)=2664045096
>>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>>                 Combine input records=0
>>>>>>>>>>                 Combine output records=0
>>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>>         Shuffle Errors
>>>>>>>>>>                 BAD_ID=0
>>>>>>>>>>                 CONNECTION=0
>>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>>         File Input Format Counters
>>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>>         File Output Format Counters
>>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>>
>>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>>
>>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>>> "60";
>>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>>
>>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>>
>>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count =
>>>>>>>>>>> "64";
>>>>>>>>>>>
>>>>>>>>>>> yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>>
>>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>>
>>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>>
>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sam,
>>>>>>>>>>>> I think your cluster is too small for any meaningful
>>>>>>>>>>>> conclusions to be made.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>>
>>>>>>>>>>>> Mike Segel
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>>
>>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but,
>>>>>>>>>>>> MRv1 job has special execution process(map > shuffle > reduce) in Hadoop
>>>>>>>>>>>> 1.x, and how Yarn execute a MRv1 job? still include some special MR steps
>>>>>>>>>>>> in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>>> execute a job?
>>>>>>>>>>>>
>>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set
>>>>>>>>>>>> the default size of physical mem of a container?
>>>>>>>>>>>> - How to set the maximum size of physical mem of a container?
>>>>>>>>>>>> By the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>>
>>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>> 22 containers due to a 22 GB memory?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each
>>>>>>>>>>>>> for Map
>>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>>> that
>>>>>>>>>>>>> runs the job, additionally. This is tunable using the
>>>>>>>>>>>>> properties
>>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>>> took to
>>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here.
>>>>>>>>>>>>> Every
>>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you
>>>>>>>>>>>>> have there
>>>>>>>>>>>>> (if not the memory). Either increase memory requests from
>>>>>>>>>>>>> jobs, to
>>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>>> change), or
>>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>>> (config
>>>>>>>>>>>>> change).
>>>>>>>>>>>>>
>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>> to be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>>> specific
>>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in
>>>>>>>>>>>>> the config
>>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>>> setting, and
>>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000.
>>>>>>>>>>>>> Ant then, the
>>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > - How to know the container number? Why you say it will be
>>>>>>>>>>>>> 22 containers due
>>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>>> be assigned to
>>>>>>>>>>>>> > containers?
>>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name'
>>>>>>>>>>>>> to be 'yarn', will
>>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>>> framework? Like
>>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>>> config
>>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>>> (as
>>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>>> memory
>>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still
>>>>>>>>>>>>> the same for
>>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>>> about what's
>>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>>> >> > twice
>>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>>> the default
>>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of
>>>>>>>>>>>>> your tasks getting
>>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 31%
>>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 32%
>>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 33%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 35%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 40%
>>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
>>>>>>>>>>>>> 43%
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>>> manager and
>>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>>> the Resource
>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>>> application.</description>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>>> >> >> is
>>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>>> Resource
>>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>>> >> >> the
>>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>>> </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>>> port</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>>> in
>>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>>> logs are moved
>>>>>>>>>>>>> >> >> to
>>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>>> log
>>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set
>>>>>>>>>>>>> for Map Reduce to
>>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>>> memory resource
>>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>>> minimum by
>>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>>> configs as at
>>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>>> users and to not
>>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>>> results. After
>>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>>> do comparison
>>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>>> to compare
>>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>>> the near
>>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>>> comprison.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>>> set on the
>>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>>> their
>>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>>> framework than
>>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>>> on Reduce
>>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>>> terasort?
>>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>>> Is any
>>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>>
>>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >>
>>>>>>>>>>>>> >> >
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >>
>>>>>>>>>>>>> >> --
>>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Increasing the slowstart is not meant to increase performance, but should
make for a fairer comparison.  Have you tried making sure that in MR2 only
8 map tasks are running concurrently, or boosting MR1 up to 16?

-Sandy


On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang
<ji...@gmail.com>wrote:

> Changing mapreduce.job.reduce.
> slowstart.completedmaps to 0.99 does not look good. The map phase alone
> took 48 minutes and total time seems to be even longer. Any way to let map
> phase run faster?
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Thanks Sandy.
>>
>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>> in MR1 both use the default value 0.05.
>>
>> I tried to allocate 1536MB and 1024MB to map container some time ago, but
>> the changes did not give me a better result, thus, I changed it back to
>> 768MB.
>>
>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
>> BTW, I should use
>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>
>> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
>> the counterpart for MR2. Which one I should tune for the http thread?
>>
>> Thanks again.
>>
>>
>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>> about three times as long in MR2 as they are in MR1.  This is probably
>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>> if you are oversubscribing your node's resources.  Because of the different
>>> way that MR1 and MR2 view resources, I think it's better to test with
>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>> phases will run separately.
>>>
>>> On the other side, it looks like your MR1 has more spilled records than
>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>> to .13, which should improve MR1 performance, but will provide a fairer
>>> comparison (MR2 automatically does this tuning for you).
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>> reduce, I would expect that MR2 is faster.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>> tasks page).  Can you share the counters for MR1?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>>> to 2 times slowness. There should be something very wrong either in
>>>>>> configuration or code. Any hints?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>
>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>
>>>>>>>
>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>> No lease on
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>> File does not exist. Holder
>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>> any open files.
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>> --
>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>
>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>> (main): Counters: 46
>>>>>>>>>         File System Counters
>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>         Job Counters
>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>> (ms)=1696141311
>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>> (ms)=2664045096
>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>                 Combine input records=0
>>>>>>>>>                 Combine output records=0
>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>         Shuffle Errors
>>>>>>>>>                 BAD_ID=0
>>>>>>>>>                 CONNECTION=0
>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>         File Input Format Counters
>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>         File Output Format Counters
>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>
>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>
>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>
>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>> "60";
>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>
>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>>> = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>
>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>
>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sam,
>>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>>> to be made.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>
>>>>>>>>>>> Mike Segel
>>>>>>>>>>>
>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>
>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>> execute a job?
>>>>>>>>>>>
>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>
>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>
>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>>
>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>>> Map
>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>> that
>>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>
>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>> took to
>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>>> there
>>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>>> to
>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>> change), or
>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>> (config
>>>>>>>>>>>> change).
>>>>>>>>>>>>
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>> specific
>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>>> config
>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>> >
>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>> setting, and
>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>>> then, the
>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>> >
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due
>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to
>>>>>>>>>>>> > containers?
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like
>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>> config
>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>> (as
>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>> memory
>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>>> same for
>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>> about what's
>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>> >> > twice
>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>> the default
>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>>> tasks getting
>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>> manager and
>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>> the Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>> application.</description>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>> >> >> is
>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>> Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>> >> >> the
>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>> port</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>> in
>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>> logs are moved
>>>>>>>>>>>> >> >> to
>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>> log
>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>>> Map Reduce to
>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>> memory resource
>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>> minimum by
>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>> configs as at
>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>> users and to not
>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>> results. After
>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>> do comparison
>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>> to compare
>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>> the near
>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>> comprison.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>> set on the
>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>> their
>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>> framework than
>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>> terasort?
>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>> Is any
>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> --
>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Increasing the slowstart is not meant to increase performance, but should
make for a fairer comparison.  Have you tried making sure that in MR2 only
8 map tasks are running concurrently, or boosting MR1 up to 16?

-Sandy


On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang
<ji...@gmail.com>wrote:

> Changing mapreduce.job.reduce.
> slowstart.completedmaps to 0.99 does not look good. The map phase alone
> took 48 minutes and total time seems to be even longer. Any way to let map
> phase run faster?
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Thanks Sandy.
>>
>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>> in MR1 both use the default value 0.05.
>>
>> I tried to allocate 1536MB and 1024MB to map container some time ago, but
>> the changes did not give me a better result, thus, I changed it back to
>> 768MB.
>>
>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
>> BTW, I should use
>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>
>> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
>> the counterpart for MR2. Which one I should tune for the http thread?
>>
>> Thanks again.
>>
>>
>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>> about three times as long in MR2 as they are in MR1.  This is probably
>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>> if you are oversubscribing your node's resources.  Because of the different
>>> way that MR1 and MR2 view resources, I think it's better to test with
>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>> phases will run separately.
>>>
>>> On the other side, it looks like your MR1 has more spilled records than
>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>> to .13, which should improve MR1 performance, but will provide a fairer
>>> comparison (MR2 automatically does this tuning for you).
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>> reduce, I would expect that MR2 is faster.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>> tasks page).  Can you share the counters for MR1?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>>> to 2 times slowness. There should be something very wrong either in
>>>>>> configuration or code. Any hints?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>
>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>
>>>>>>>
>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>> No lease on
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>> File does not exist. Holder
>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>> any open files.
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>> --
>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>
>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>> (main): Counters: 46
>>>>>>>>>         File System Counters
>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>         Job Counters
>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>> (ms)=1696141311
>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>> (ms)=2664045096
>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>                 Combine input records=0
>>>>>>>>>                 Combine output records=0
>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>         Shuffle Errors
>>>>>>>>>                 BAD_ID=0
>>>>>>>>>                 CONNECTION=0
>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>         File Input Format Counters
>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>         File Output Format Counters
>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>
>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>
>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>
>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>> "60";
>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>
>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>>> = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>
>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>
>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sam,
>>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>>> to be made.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>
>>>>>>>>>>> Mike Segel
>>>>>>>>>>>
>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>
>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>> execute a job?
>>>>>>>>>>>
>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>
>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>
>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>>
>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>>> Map
>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>> that
>>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>
>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>> took to
>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>>> there
>>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>>> to
>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>> change), or
>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>> (config
>>>>>>>>>>>> change).
>>>>>>>>>>>>
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>> specific
>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>>> config
>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>> >
>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>> setting, and
>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>>> then, the
>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>> >
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due
>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to
>>>>>>>>>>>> > containers?
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like
>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>> config
>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>> (as
>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>> memory
>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>>> same for
>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>> about what's
>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>> >> > twice
>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>> the default
>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>>> tasks getting
>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>> manager and
>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>> the Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>> application.</description>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>> >> >> is
>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>> Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>> >> >> the
>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>> port</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>> in
>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>> logs are moved
>>>>>>>>>>>> >> >> to
>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>> log
>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>>> Map Reduce to
>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>> memory resource
>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>> minimum by
>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>> configs as at
>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>> users and to not
>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>> results. After
>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>> do comparison
>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>> to compare
>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>> the near
>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>> comprison.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>> set on the
>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>> their
>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>> framework than
>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>> terasort?
>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>> Is any
>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> --
>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Increasing the slowstart is not meant to increase performance, but should
make for a fairer comparison.  Have you tried making sure that in MR2 only
8 map tasks are running concurrently, or boosting MR1 up to 16?

-Sandy


On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang
<ji...@gmail.com>wrote:

> Changing mapreduce.job.reduce.
> slowstart.completedmaps to 0.99 does not look good. The map phase alone
> took 48 minutes and total time seems to be even longer. Any way to let map
> phase run faster?
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Thanks Sandy.
>>
>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>> in MR1 both use the default value 0.05.
>>
>> I tried to allocate 1536MB and 1024MB to map container some time ago, but
>> the changes did not give me a better result, thus, I changed it back to
>> 768MB.
>>
>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
>> BTW, I should use
>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>
>> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
>> the counterpart for MR2. Which one I should tune for the http thread?
>>
>> Thanks again.
>>
>>
>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>> about three times as long in MR2 as they are in MR1.  This is probably
>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>> if you are oversubscribing your node's resources.  Because of the different
>>> way that MR1 and MR2 view resources, I think it's better to test with
>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>> phases will run separately.
>>>
>>> On the other side, it looks like your MR1 has more spilled records than
>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>> to .13, which should improve MR1 performance, but will provide a fairer
>>> comparison (MR2 automatically does this tuning for you).
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>> reduce, I would expect that MR2 is faster.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>> tasks page).  Can you share the counters for MR1?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>>> to 2 times slowness. There should be something very wrong either in
>>>>>> configuration or code. Any hints?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>
>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>
>>>>>>>
>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>> No lease on
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>> File does not exist. Holder
>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>> any open files.
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>> --
>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>
>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>> (main): Counters: 46
>>>>>>>>>         File System Counters
>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>         Job Counters
>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>> (ms)=1696141311
>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>> (ms)=2664045096
>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>                 Combine input records=0
>>>>>>>>>                 Combine output records=0
>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>         Shuffle Errors
>>>>>>>>>                 BAD_ID=0
>>>>>>>>>                 CONNECTION=0
>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>         File Input Format Counters
>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>         File Output Format Counters
>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>
>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>
>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>
>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>> "60";
>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>
>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>>> = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>
>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>
>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sam,
>>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>>> to be made.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>
>>>>>>>>>>> Mike Segel
>>>>>>>>>>>
>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>
>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>> execute a job?
>>>>>>>>>>>
>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>
>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>
>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>>
>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>>> Map
>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>> that
>>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>
>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>> took to
>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>>> there
>>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>>> to
>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>> change), or
>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>> (config
>>>>>>>>>>>> change).
>>>>>>>>>>>>
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>> specific
>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>>> config
>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>> >
>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>> setting, and
>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>>> then, the
>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>> >
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due
>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to
>>>>>>>>>>>> > containers?
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like
>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>> config
>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>> (as
>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>> memory
>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>>> same for
>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>> about what's
>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>> >> > twice
>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>> the default
>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>>> tasks getting
>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>> manager and
>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>> the Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>> application.</description>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>> >> >> is
>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>> Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>> >> >> the
>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>> port</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>> in
>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>> logs are moved
>>>>>>>>>>>> >> >> to
>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>> log
>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>>> Map Reduce to
>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>> memory resource
>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>> minimum by
>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>> configs as at
>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>> users and to not
>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>> results. After
>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>> do comparison
>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>> to compare
>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>> the near
>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>> comprison.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>> set on the
>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>> their
>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>> framework than
>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>> terasort?
>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>> Is any
>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> --
>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Increasing the slowstart is not meant to increase performance, but should
make for a fairer comparison.  Have you tried making sure that in MR2 only
8 map tasks are running concurrently, or boosting MR1 up to 16?

-Sandy


On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang
<ji...@gmail.com>wrote:

> Changing mapreduce.job.reduce.
> slowstart.completedmaps to 0.99 does not look good. The map phase alone
> took 48 minutes and total time seems to be even longer. Any way to let map
> phase run faster?
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Thanks Sandy.
>>
>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>> in MR1 both use the default value 0.05.
>>
>> I tried to allocate 1536MB and 1024MB to map container some time ago, but
>> the changes did not give me a better result, thus, I changed it back to
>> 768MB.
>>
>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
>> BTW, I should use
>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>
>> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
>> the counterpart for MR2. Which one I should tune for the http thread?
>>
>> Thanks again.
>>
>>
>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>> about three times as long in MR2 as they are in MR1.  This is probably
>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>> if you are oversubscribing your node's resources.  Because of the different
>>> way that MR1 and MR2 view resources, I think it's better to test with
>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>> phases will run separately.
>>>
>>> On the other side, it looks like your MR1 has more spilled records than
>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>> to .13, which should improve MR1 performance, but will provide a fairer
>>> comparison (MR2 automatically does this tuning for you).
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>> reduce, I would expect that MR2 is faster.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>> tasks page).  Can you share the counters for MR1?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>>> to 2 times slowness. There should be something very wrong either in
>>>>>> configuration or code. Any hints?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>
>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>
>>>>>>>
>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>> No lease on
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>> File does not exist. Holder
>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>> any open files.
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>> --
>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>
>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>> (main): Counters: 46
>>>>>>>>>         File System Counters
>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>         Job Counters
>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>> (ms)=1696141311
>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>> (ms)=2664045096
>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>                 Combine input records=0
>>>>>>>>>                 Combine output records=0
>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>         Shuffle Errors
>>>>>>>>>                 BAD_ID=0
>>>>>>>>>                 CONNECTION=0
>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>         File Input Format Counters
>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>         File Output Format Counters
>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>
>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>
>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>
>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>> "60";
>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>
>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>>> = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>
>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>
>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sam,
>>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>>> to be made.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>
>>>>>>>>>>> Mike Segel
>>>>>>>>>>>
>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>
>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>> execute a job?
>>>>>>>>>>>
>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>
>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>
>>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>>
>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>>> Map
>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>> that
>>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>
>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>> took to
>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>>> there
>>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>>> to
>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>> change), or
>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>> (config
>>>>>>>>>>>> change).
>>>>>>>>>>>>
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>> specific
>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>>> config
>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>> >
>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>> setting, and
>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>>> then, the
>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>> >
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due
>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to
>>>>>>>>>>>> > containers?
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like
>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>> config
>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>> (as
>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>> memory
>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>>> same for
>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>> about what's
>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>> >> > twice
>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>> the default
>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>>> tasks getting
>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>> manager and
>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>> the Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>> application.</description>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>> >> >> is
>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>> Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>> >> >> the
>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>> port</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>> in
>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>> logs are moved
>>>>>>>>>>>> >> >> to
>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>> log
>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>>> Map Reduce to
>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>> memory resource
>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>> minimum by
>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>> configs as at
>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>> users and to not
>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>> results. After
>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>> do comparison
>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>> to compare
>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>> the near
>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>> comprison.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>> set on the
>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>> their
>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>> framework than
>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>> terasort?
>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>> Is any
>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> --
>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Changing mapreduce.job.reduce.
slowstart.completedmaps to 0.99 does not look good. The map phase alone
took 48 minutes and total time seems to be even longer. Any way to let map
phase run faster?

Thanks.


On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang
<ji...@gmail.com>wrote:

> Thanks Sandy.
>
> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
> in MR1 both use the default value 0.05.
>
> I tried to allocate 1536MB and 1024MB to map container some time ago, but
> the changes did not give me a better result, thus, I changed it back to
> 768MB.
>
> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
> BTW, I should use
> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>
> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
> the counterpart for MR2. Which one I should tune for the http thread?
>
> Thanks again.
>
>
> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
>> three times as long in MR2 as they are in MR1.  This is probably because
>> you allow twice as many map tasks to run at a time in MR2 (12288/768 =
>> 16).  Being able to use all the containers isn't necessarily a good thing
>> if you are oversubscribing your node's resources.  Because of the different
>> way that MR1 and MR2 view resources, I think it's better to test with
>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>> phases will run separately.
>>
>> On the other side, it looks like your MR1 has more spilled records than
>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>> to .13, which should improve MR1 performance, but will provide a fairer
>> comparison (MR2 automatically does this tuning for you).
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The number of map slots and reduce slots on each data node for MR1 are 8
>>> and 3, respectively. Since MR2 could use all containers for either map or
>>> reduce, I would expect that MR2 is faster.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>> tasks page).  Can you share the counters for MR1?
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>> to 2 times slowness. There should be something very wrong either in
>>>>> configuration or code. Any hints?
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>
>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>
>>>>>>
>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>> No lease on
>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>> File does not exist. Holder
>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>> any open files.
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>> --
>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>         at
>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>> java.nio.channels.ClosedChannelException
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>         at
>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>         at
>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>
>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>> (main): Counters: 46
>>>>>>>>         File System Counters
>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>         Job Counters
>>>>>>>>                 Killed map tasks=1
>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>                 Launched map tasks=7601
>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>> (ms)=1696141311
>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>> (ms)=2664045096
>>>>>>>>         Map-Reduce Framework
>>>>>>>>                 Map input records=10000000000
>>>>>>>>                 Map output records=10000000000
>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>                 Input split bytes=851200
>>>>>>>>                 Combine input records=0
>>>>>>>>                 Combine output records=0
>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>                 Failed Shuffles=61
>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>         Shuffle Errors
>>>>>>>>                 BAD_ID=0
>>>>>>>>                 CONNECTION=0
>>>>>>>>                 IO_ERROR=4
>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>                 WRONG_MAP=0
>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>         File Input Format Counters
>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>         File Output Format Counters
>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>>>> MR1. I cannot really believe it.
>>>>>>>>>
>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>
>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>
>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>
>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>> "60";
>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>
>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>
>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>> = "64";
>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>
>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>
>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sam,
>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>> to be made.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>
>>>>>>>>>> Mike Segel
>>>>>>>>>>
>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Harsh,
>>>>>>>>>>
>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>
>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>> execute a job?
>>>>>>>>>>
>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>> uses container instead, right?
>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>
>>>>>>>>>> Thanks as always!
>>>>>>>>>>
>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>
>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>
>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>
>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>> Map
>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>> that
>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>
>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>>> assigned to containers?
>>>>>>>>>>>
>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>> took to
>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>> there
>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>> to
>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>> change), or
>>>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>>>> change).
>>>>>>>>>>>
>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>
>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>> specific
>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>> config
>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>> >
>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>> setting, and
>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>> then, the
>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>> >
>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>> containers due
>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>>> assigned to
>>>>>>>>>>> > containers?
>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>> framework? Like
>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks!
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>> >>
>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>> >>
>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>> config
>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>> (as
>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>> same for
>>>>>>>>>>> >> both test runs.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>> about what's
>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>> >> > twice
>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>> the default
>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>> tasks getting
>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>> >> >> 33%
>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>> Thanks!
>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>> >> >> </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>> manager and
>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>>>> Resource
>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>> application.</description>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>> >> >> is
>>>>>>>>>>> >> >> the port
>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>> Resource
>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>> ResourceManager and
>>>>>>>>>>> >> >> the
>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>> </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>> port</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>> logs are moved
>>>>>>>>>>> >> >> to
>>>>>>>>>>> >> >> </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>> log
>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>> Map Reduce to
>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>> memory resource
>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>> minimum by
>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>> configs as at
>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>>>> and to not
>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>> of MRv1 and
>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>> >> >>> > But
>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>> comparison I did not
>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>> results. After
>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>>>> comparison
>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>> to compare
>>>>>>>>>>> >> >>> > with
>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>>>> Reduce
>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>>>> performance
>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>> the near
>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>> similar hardware:
>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>> set on the
>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>> their
>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>>>> MRv1, if
>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>> framework than
>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>>>> be that Yarn
>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>>>> Reduce
>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>> terasort?
>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>>>> any
>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> --
>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Harsh J
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Harsh J
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Changing mapreduce.job.reduce.
slowstart.completedmaps to 0.99 does not look good. The map phase alone
took 48 minutes and total time seems to be even longer. Any way to let map
phase run faster?

Thanks.


On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang
<ji...@gmail.com>wrote:

> Thanks Sandy.
>
> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
> in MR1 both use the default value 0.05.
>
> I tried to allocate 1536MB and 1024MB to map container some time ago, but
> the changes did not give me a better result, thus, I changed it back to
> 768MB.
>
> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
> BTW, I should use
> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>
> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
> the counterpart for MR2. Which one I should tune for the http thread?
>
> Thanks again.
>
>
> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
>> three times as long in MR2 as they are in MR1.  This is probably because
>> you allow twice as many map tasks to run at a time in MR2 (12288/768 =
>> 16).  Being able to use all the containers isn't necessarily a good thing
>> if you are oversubscribing your node's resources.  Because of the different
>> way that MR1 and MR2 view resources, I think it's better to test with
>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>> phases will run separately.
>>
>> On the other side, it looks like your MR1 has more spilled records than
>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>> to .13, which should improve MR1 performance, but will provide a fairer
>> comparison (MR2 automatically does this tuning for you).
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The number of map slots and reduce slots on each data node for MR1 are 8
>>> and 3, respectively. Since MR2 could use all containers for either map or
>>> reduce, I would expect that MR2 is faster.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>> tasks page).  Can you share the counters for MR1?
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>> to 2 times slowness. There should be something very wrong either in
>>>>> configuration or code. Any hints?
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>
>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>
>>>>>>
>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>> No lease on
>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>> File does not exist. Holder
>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>> any open files.
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>> --
>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>         at
>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>> java.nio.channels.ClosedChannelException
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>         at
>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>         at
>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>
>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>> (main): Counters: 46
>>>>>>>>         File System Counters
>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>         Job Counters
>>>>>>>>                 Killed map tasks=1
>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>                 Launched map tasks=7601
>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>> (ms)=1696141311
>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>> (ms)=2664045096
>>>>>>>>         Map-Reduce Framework
>>>>>>>>                 Map input records=10000000000
>>>>>>>>                 Map output records=10000000000
>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>                 Input split bytes=851200
>>>>>>>>                 Combine input records=0
>>>>>>>>                 Combine output records=0
>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>                 Failed Shuffles=61
>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>         Shuffle Errors
>>>>>>>>                 BAD_ID=0
>>>>>>>>                 CONNECTION=0
>>>>>>>>                 IO_ERROR=4
>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>                 WRONG_MAP=0
>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>         File Input Format Counters
>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>         File Output Format Counters
>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>>>> MR1. I cannot really believe it.
>>>>>>>>>
>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>
>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>
>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>
>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>> "60";
>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>
>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>
>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>> = "64";
>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>
>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>
>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sam,
>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>> to be made.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>
>>>>>>>>>> Mike Segel
>>>>>>>>>>
>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Harsh,
>>>>>>>>>>
>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>
>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>> execute a job?
>>>>>>>>>>
>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>> uses container instead, right?
>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>
>>>>>>>>>> Thanks as always!
>>>>>>>>>>
>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>
>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>
>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>
>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>> Map
>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>> that
>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>
>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>>> assigned to containers?
>>>>>>>>>>>
>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>> took to
>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>> there
>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>> to
>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>> change), or
>>>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>>>> change).
>>>>>>>>>>>
>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>
>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>> specific
>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>> config
>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>> >
>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>> setting, and
>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>> then, the
>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>> >
>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>> containers due
>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>>> assigned to
>>>>>>>>>>> > containers?
>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>> framework? Like
>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks!
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>> >>
>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>> >>
>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>> config
>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>> (as
>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>> same for
>>>>>>>>>>> >> both test runs.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>> about what's
>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>> >> > twice
>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>> the default
>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>> tasks getting
>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>> >> >> 33%
>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>> Thanks!
>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>> >> >> </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>> manager and
>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>>>> Resource
>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>> application.</description>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>> >> >> is
>>>>>>>>>>> >> >> the port
>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>> Resource
>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>> ResourceManager and
>>>>>>>>>>> >> >> the
>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>> </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>> port</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>> logs are moved
>>>>>>>>>>> >> >> to
>>>>>>>>>>> >> >> </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>> log
>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>> Map Reduce to
>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>> memory resource
>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>> minimum by
>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>> configs as at
>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>>>> and to not
>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>> of MRv1 and
>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>> >> >>> > But
>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>> comparison I did not
>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>> results. After
>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>>>> comparison
>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>> to compare
>>>>>>>>>>> >> >>> > with
>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>>>> Reduce
>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>>>> performance
>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>> the near
>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>> similar hardware:
>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>> set on the
>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>> their
>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>>>> MRv1, if
>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>> framework than
>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>>>> be that Yarn
>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>>>> Reduce
>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>> terasort?
>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>>>> any
>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> --
>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Harsh J
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Harsh J
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Changing mapreduce.job.reduce.
slowstart.completedmaps to 0.99 does not look good. The map phase alone
took 48 minutes and total time seems to be even longer. Any way to let map
phase run faster?

Thanks.


On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang
<ji...@gmail.com>wrote:

> Thanks Sandy.
>
> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
> in MR1 both use the default value 0.05.
>
> I tried to allocate 1536MB and 1024MB to map container some time ago, but
> the changes did not give me a better result, thus, I changed it back to
> 768MB.
>
> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
> BTW, I should use
> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>
> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
> the counterpart for MR2. Which one I should tune for the http thread?
>
> Thanks again.
>
>
> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
>> three times as long in MR2 as they are in MR1.  This is probably because
>> you allow twice as many map tasks to run at a time in MR2 (12288/768 =
>> 16).  Being able to use all the containers isn't necessarily a good thing
>> if you are oversubscribing your node's resources.  Because of the different
>> way that MR1 and MR2 view resources, I think it's better to test with
>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>> phases will run separately.
>>
>> On the other side, it looks like your MR1 has more spilled records than
>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>> to .13, which should improve MR1 performance, but will provide a fairer
>> comparison (MR2 automatically does this tuning for you).
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The number of map slots and reduce slots on each data node for MR1 are 8
>>> and 3, respectively. Since MR2 could use all containers for either map or
>>> reduce, I would expect that MR2 is faster.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>> tasks page).  Can you share the counters for MR1?
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>> to 2 times slowness. There should be something very wrong either in
>>>>> configuration or code. Any hints?
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>
>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>
>>>>>>
>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>> No lease on
>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>> File does not exist. Holder
>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>> any open files.
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>> --
>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>         at
>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>> java.nio.channels.ClosedChannelException
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>         at
>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>         at
>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>
>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>> (main): Counters: 46
>>>>>>>>         File System Counters
>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>         Job Counters
>>>>>>>>                 Killed map tasks=1
>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>                 Launched map tasks=7601
>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>> (ms)=1696141311
>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>> (ms)=2664045096
>>>>>>>>         Map-Reduce Framework
>>>>>>>>                 Map input records=10000000000
>>>>>>>>                 Map output records=10000000000
>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>                 Input split bytes=851200
>>>>>>>>                 Combine input records=0
>>>>>>>>                 Combine output records=0
>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>                 Failed Shuffles=61
>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>         Shuffle Errors
>>>>>>>>                 BAD_ID=0
>>>>>>>>                 CONNECTION=0
>>>>>>>>                 IO_ERROR=4
>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>                 WRONG_MAP=0
>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>         File Input Format Counters
>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>         File Output Format Counters
>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>>>> MR1. I cannot really believe it.
>>>>>>>>>
>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>
>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>
>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>
>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>> "60";
>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>
>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>
>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>> = "64";
>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>
>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>
>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sam,
>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>> to be made.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>
>>>>>>>>>> Mike Segel
>>>>>>>>>>
>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Harsh,
>>>>>>>>>>
>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>
>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>> execute a job?
>>>>>>>>>>
>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>> uses container instead, right?
>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>
>>>>>>>>>> Thanks as always!
>>>>>>>>>>
>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>
>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>
>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>
>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>> Map
>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>> that
>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>
>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>>> assigned to containers?
>>>>>>>>>>>
>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>> took to
>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>> there
>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>> to
>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>> change), or
>>>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>>>> change).
>>>>>>>>>>>
>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>
>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>> specific
>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>> config
>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>> >
>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>> setting, and
>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>> then, the
>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>> >
>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>> containers due
>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>>> assigned to
>>>>>>>>>>> > containers?
>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>> framework? Like
>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks!
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>> >>
>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>> >>
>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>> config
>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>> (as
>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>> same for
>>>>>>>>>>> >> both test runs.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>> about what's
>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>> >> > twice
>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>> the default
>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>> tasks getting
>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>> >> >> 33%
>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>> Thanks!
>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>> >> >> </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>> manager and
>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>>>> Resource
>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>> application.</description>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>> >> >> is
>>>>>>>>>>> >> >> the port
>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>> Resource
>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>> ResourceManager and
>>>>>>>>>>> >> >> the
>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>> </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>> port</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>> logs are moved
>>>>>>>>>>> >> >> to
>>>>>>>>>>> >> >> </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>> log
>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>> Map Reduce to
>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>> memory resource
>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>> minimum by
>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>> configs as at
>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>>>> and to not
>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>> of MRv1 and
>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>> >> >>> > But
>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>> comparison I did not
>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>> results. After
>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>>>> comparison
>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>> to compare
>>>>>>>>>>> >> >>> > with
>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>>>> Reduce
>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>>>> performance
>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>> the near
>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>> similar hardware:
>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>> set on the
>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>> their
>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>>>> MRv1, if
>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>> framework than
>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>>>> be that Yarn
>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>>>> Reduce
>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>> terasort?
>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>>>> any
>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> --
>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Harsh J
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Harsh J
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Changing mapreduce.job.reduce.
slowstart.completedmaps to 0.99 does not look good. The map phase alone
took 48 minutes and total time seems to be even longer. Any way to let map
phase run faster?

Thanks.


On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang
<ji...@gmail.com>wrote:

> Thanks Sandy.
>
> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
> in MR1 both use the default value 0.05.
>
> I tried to allocate 1536MB and 1024MB to map container some time ago, but
> the changes did not give me a better result, thus, I changed it back to
> 768MB.
>
> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
> BTW, I should use
> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>
> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
> the counterpart for MR2. Which one I should tune for the http thread?
>
> Thanks again.
>
>
> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
>> three times as long in MR2 as they are in MR1.  This is probably because
>> you allow twice as many map tasks to run at a time in MR2 (12288/768 =
>> 16).  Being able to use all the containers isn't necessarily a good thing
>> if you are oversubscribing your node's resources.  Because of the different
>> way that MR1 and MR2 view resources, I think it's better to test with
>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>> phases will run separately.
>>
>> On the other side, it looks like your MR1 has more spilled records than
>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>> to .13, which should improve MR1 performance, but will provide a fairer
>> comparison (MR2 automatically does this tuning for you).
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The number of map slots and reduce slots on each data node for MR1 are 8
>>> and 3, respectively. Since MR2 could use all containers for either map or
>>> reduce, I would expect that MR2 is faster.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>> tasks page).  Can you share the counters for MR1?
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>> to 2 times slowness. There should be something very wrong either in
>>>>> configuration or code. Any hints?
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>
>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>
>>>>>>
>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>> No lease on
>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>> File does not exist. Holder
>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>> any open files.
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>> --
>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>         at
>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>> java.nio.channels.ClosedChannelException
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>         at
>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>         at
>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>         at
>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>         at
>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>         at
>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>>
>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>
>>>>>>> -Sandy
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>
>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>> (main): Counters: 46
>>>>>>>>         File System Counters
>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>         Job Counters
>>>>>>>>                 Killed map tasks=1
>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>                 Launched map tasks=7601
>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>> (ms)=1696141311
>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>> (ms)=2664045096
>>>>>>>>         Map-Reduce Framework
>>>>>>>>                 Map input records=10000000000
>>>>>>>>                 Map output records=10000000000
>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>                 Input split bytes=851200
>>>>>>>>                 Combine input records=0
>>>>>>>>                 Combine output records=0
>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>                 Failed Shuffles=61
>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>         Shuffle Errors
>>>>>>>>                 BAD_ID=0
>>>>>>>>                 CONNECTION=0
>>>>>>>>                 IO_ERROR=4
>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>                 WRONG_MAP=0
>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>         File Input Format Counters
>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>         File Output Format Counters
>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>>>> MR1. I cannot really believe it.
>>>>>>>>>
>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>
>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>
>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>
>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>> "60";
>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>
>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>
>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>> = "64";
>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>
>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>
>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sam,
>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>> to be made.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>
>>>>>>>>>> Mike Segel
>>>>>>>>>>
>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Harsh,
>>>>>>>>>>
>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>
>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>> execute a job?
>>>>>>>>>>
>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>> uses container instead, right?
>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>
>>>>>>>>>> Thanks as always!
>>>>>>>>>>
>>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>>
>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>
>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>
>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>> Map
>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>> that
>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>
>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>>> assigned to containers?
>>>>>>>>>>>
>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>> took to
>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>> there
>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>> to
>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>> change), or
>>>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>>>> change).
>>>>>>>>>>>
>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>
>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>> specific
>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>> config
>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>> >
>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>> setting, and
>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>> then, the
>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>> >
>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>> containers due
>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>>> assigned to
>>>>>>>>>>> > containers?
>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>> framework? Like
>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks!
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>>> >>
>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>> >>
>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>> config
>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>> (as
>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>> same for
>>>>>>>>>>> >> both test runs.
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>> about what's
>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>> >> > twice
>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>> the default
>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>> tasks getting
>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>> >> >> 33%
>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>> Thanks!
>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>> >> >> </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>> manager and
>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>>>> Resource
>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>> application.</description>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>> >> >> is
>>>>>>>>>>> >> >> the port
>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>> Resource
>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>> ResourceManager and
>>>>>>>>>>> >> >> the
>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>> </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>> port</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>> logs are moved
>>>>>>>>>>> >> >> to
>>>>>>>>>>> >> >> </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>> >> >>
>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>> log
>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>> Map Reduce to
>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>> memory resource
>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>> minimum by
>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>> configs as at
>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>>>> and to not
>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>> of MRv1 and
>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>> >> >>> > But
>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>> comparison I did not
>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>> results. After
>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>>>> comparison
>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>> to compare
>>>>>>>>>>> >> >>> > with
>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>>>> Reduce
>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>>>> performance
>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>> the near
>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>> similar hardware:
>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>> set on the
>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>> their
>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>>>> MRv1, if
>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>> framework than
>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>>>> be that Yarn
>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>>>> Reduce
>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>> terasort?
>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>>>> any
>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>> >> >>> >>
>>>>>>>>>>> >> >>> >
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> --
>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Harsh J
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Harsh J
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Thanks Sandy.

io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
mapreduce.job.reduce.slowstart.completedmaps in MR2 and
mapred.reduce.slowstart.completed.maps
in MR1 both use the default value 0.05.

I tried to allocate 1536MB and 1024MB to map container some time ago, but
the changes did not give me a better result, thus, I changed it back to
768MB.

Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
BTW, I should use
mapreduce.job.reduce.slowstart.completedmaps in MR2, right?

Also, in MR1 I can specify tasktracker.http.threads, but I could not find
the counterpart for MR2. Which one I should tune for the http thread?

Thanks again.


On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
> three times as long in MR2 as they are in MR1.  This is probably because
> you allow twice as many map tasks to run at a time in MR2 (12288/768 =
> 16).  Being able to use all the containers isn't necessarily a good thing
> if you are oversubscribing your node's resources.  Because of the different
> way that MR1 and MR2 view resources, I think it's better to test with
> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
> phases will run separately.
>
> On the other side, it looks like your MR1 has more spilled records than
> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
> to .13, which should improve MR1 performance, but will provide a fairer
> comparison (MR2 automatically does this tuning for you).
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <ji...@gmail.com>wrote:
>
>> The number of map slots and reduce slots on each data node for MR1 are 8
>> and 3, respectively. Since MR2 could use all containers for either map or
>> reduce, I would expect that MR2 is faster.
>>
>>
>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> How many map and reduce slots are you using per tasktracker in MR1?  How
>>> do the average map times compare? (MR2 reports this directly on the web UI,
>>> but you can also get a sense in MR1 by scrolling through the map tasks
>>> page).  Can you share the counters for MR1?
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>> to 2 times slowness. There should be something very wrong either in
>>>> configuration or code. Any hints?
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>
>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>
>>>>>
>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>> No lease on
>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>> File does not exist. Holder
>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>> any open files.
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>         at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>> --
>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>         at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>         at
>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>         at
>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>> java.nio.channels.ClosedChannelException
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>         at
>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>         at
>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>         at
>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>
>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>>>> Counters: 46
>>>>>>>         File System Counters
>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>                 FILE: Number of read operations=0
>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>                 FILE: Number of write operations=0
>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>         Job Counters
>>>>>>>                 Killed map tasks=1
>>>>>>>                 Killed reduce tasks=20
>>>>>>>                 Launched map tasks=7601
>>>>>>>                 Launched reduce tasks=132
>>>>>>>                 Data-local map tasks=7591
>>>>>>>                 Rack-local map tasks=10
>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>> (ms)=1696141311
>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>> (ms)=2664045096
>>>>>>>         Map-Reduce Framework
>>>>>>>                 Map input records=10000000000
>>>>>>>                 Map output records=10000000000
>>>>>>>                 Map output bytes=1020000000000
>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>                 Input split bytes=851200
>>>>>>>                 Combine input records=0
>>>>>>>                 Combine output records=0
>>>>>>>                 Reduce input groups=10000000000
>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>                 Reduce input records=10000000000
>>>>>>>                 Reduce output records=10000000000
>>>>>>>                 Spilled Records=20000000000
>>>>>>>                 Shuffled Maps =851200
>>>>>>>                 Failed Shuffles=61
>>>>>>>                 Merged Map outputs=851200
>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>         Shuffle Errors
>>>>>>>                 BAD_ID=0
>>>>>>>                 CONNECTION=0
>>>>>>>                 IO_ERROR=4
>>>>>>>                 WRONG_LENGTH=0
>>>>>>>                 WRONG_MAP=0
>>>>>>>                 WRONG_REDUCE=0
>>>>>>>         File Input Format Counters
>>>>>>>                 Bytes Read=1000000000000
>>>>>>>         File Output Format Counters
>>>>>>>                 Bytes Written=1000000000000
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>>> MR1. I cannot really believe it.
>>>>>>>>
>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>> cluster configurations are as follows.
>>>>>>>>
>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>
>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>
>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>
>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>
>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>>>> "64";
>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>
>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>>>> it took around 90 minutes in MR2.
>>>>>>>>
>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sam,
>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>> to be made.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>
>>>>>>>>> Mike Segel
>>>>>>>>>
>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Harsh,
>>>>>>>>>
>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>
>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>> execute a job?
>>>>>>>>>
>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>>> container instead, right?
>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>> default size of physical mem of a container?
>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>
>>>>>>>>> Thanks as always!
>>>>>>>>>
>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>
>>>>>>>>>> Hi Sam,
>>>>>>>>>>
>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>
>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>> Map
>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>> that
>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>
>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>> assigned to containers?
>>>>>>>>>>
>>>>>>>>>> This is a general question. You may use the same process you took
>>>>>>>>>> to
>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>> there
>>>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>> change), or
>>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>>> change).
>>>>>>>>>>
>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>
>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>> specific
>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>> config
>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>
>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> > Hi Harsh,
>>>>>>>>>> >
>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>> setting, and
>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>> then, the
>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>> >
>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>> containers due
>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>> assigned to
>>>>>>>>>> > containers?
>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>> be 'yarn', will
>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>> framework? Like
>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>> >
>>>>>>>>>> > Thanks!
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>> >>
>>>>>>>>>> >> Hey Sam,
>>>>>>>>>> >>
>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>> config
>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>> same for
>>>>>>>>>> >> both test runs.
>>>>>>>>>> >>
>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>> >> >
>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>> about what's
>>>>>>>>>> >> > causing the difference.
>>>>>>>>>> >> >
>>>>>>>>>> >> > A couple observations:
>>>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>>>> in there
>>>>>>>>>> >> > twice
>>>>>>>>>> >> > with two different values.
>>>>>>>>>> >> >
>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>> the default
>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>> tasks getting
>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>> >> >
>>>>>>>>>> >> > -Sandy
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>> 5.5 mins from
>>>>>>>>>> >> >> 33%
>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>> Thanks!
>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>> >> >> </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>> >> >>    </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>
>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>> manager and
>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>>> Resource
>>>>>>>>>> >> >> Manager.
>>>>>>>>>> >> >>     </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>> application.</description>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>> resourcemanager and port
>>>>>>>>>> >> >> is
>>>>>>>>>> >> >> the port
>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>> Resource
>>>>>>>>>> >> >> Manager.
>>>>>>>>>> >> >>     </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>> ResourceManager and
>>>>>>>>>> >> >> the
>>>>>>>>>> >> >> port is the port on
>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>> </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>> port</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>>> >> >> GB</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>> logs are moved
>>>>>>>>>> >> >> to
>>>>>>>>>> >> >> </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>    <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>>>> >> >> directories</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>> Map Reduce to
>>>>>>>>>> >> >> run </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>> memory resource
>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>> minimum by
>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to
>>>>>>>>>> 8 GB NM
>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>> configs as at
>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>>> and to not
>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>> >> >>> wrote:
>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>>>> MRv1 and
>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>> >> >>> > But
>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>> comparison I did not
>>>>>>>>>> >> >>> > tune
>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>> results. After
>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>>> comparison
>>>>>>>>>> >> >>> > again.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>>>> compare
>>>>>>>>>> >> >>> > with
>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>>> Reduce
>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>>> performance
>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>> the near
>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>> >> >>> >>> and
>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>> similar hardware:
>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>> set on the
>>>>>>>>>> >> >>> >>> same
>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>>> MRv1, if
>>>>>>>>>> >> >>> >>> they
>>>>>>>>>> >> >>> >>> all
>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>> framework than
>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>>> be that Yarn
>>>>>>>>>> >> >>> >>> is
>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>>> Reduce
>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>> terasort?
>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>>> any
>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> --
>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> --
>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Harsh J
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Harsh J
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Thanks Sandy.

io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
mapreduce.job.reduce.slowstart.completedmaps in MR2 and
mapred.reduce.slowstart.completed.maps
in MR1 both use the default value 0.05.

I tried to allocate 1536MB and 1024MB to map container some time ago, but
the changes did not give me a better result, thus, I changed it back to
768MB.

Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
BTW, I should use
mapreduce.job.reduce.slowstart.completedmaps in MR2, right?

Also, in MR1 I can specify tasktracker.http.threads, but I could not find
the counterpart for MR2. Which one I should tune for the http thread?

Thanks again.


On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
> three times as long in MR2 as they are in MR1.  This is probably because
> you allow twice as many map tasks to run at a time in MR2 (12288/768 =
> 16).  Being able to use all the containers isn't necessarily a good thing
> if you are oversubscribing your node's resources.  Because of the different
> way that MR1 and MR2 view resources, I think it's better to test with
> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
> phases will run separately.
>
> On the other side, it looks like your MR1 has more spilled records than
> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
> to .13, which should improve MR1 performance, but will provide a fairer
> comparison (MR2 automatically does this tuning for you).
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <ji...@gmail.com>wrote:
>
>> The number of map slots and reduce slots on each data node for MR1 are 8
>> and 3, respectively. Since MR2 could use all containers for either map or
>> reduce, I would expect that MR2 is faster.
>>
>>
>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> How many map and reduce slots are you using per tasktracker in MR1?  How
>>> do the average map times compare? (MR2 reports this directly on the web UI,
>>> but you can also get a sense in MR1 by scrolling through the map tasks
>>> page).  Can you share the counters for MR1?
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>> to 2 times slowness. There should be something very wrong either in
>>>> configuration or code. Any hints?
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>
>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>
>>>>>
>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>> No lease on
>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>> File does not exist. Holder
>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>> any open files.
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>         at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>> --
>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>         at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>         at
>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>         at
>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>> java.nio.channels.ClosedChannelException
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>         at
>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>         at
>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>         at
>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>
>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>>>> Counters: 46
>>>>>>>         File System Counters
>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>                 FILE: Number of read operations=0
>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>                 FILE: Number of write operations=0
>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>         Job Counters
>>>>>>>                 Killed map tasks=1
>>>>>>>                 Killed reduce tasks=20
>>>>>>>                 Launched map tasks=7601
>>>>>>>                 Launched reduce tasks=132
>>>>>>>                 Data-local map tasks=7591
>>>>>>>                 Rack-local map tasks=10
>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>> (ms)=1696141311
>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>> (ms)=2664045096
>>>>>>>         Map-Reduce Framework
>>>>>>>                 Map input records=10000000000
>>>>>>>                 Map output records=10000000000
>>>>>>>                 Map output bytes=1020000000000
>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>                 Input split bytes=851200
>>>>>>>                 Combine input records=0
>>>>>>>                 Combine output records=0
>>>>>>>                 Reduce input groups=10000000000
>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>                 Reduce input records=10000000000
>>>>>>>                 Reduce output records=10000000000
>>>>>>>                 Spilled Records=20000000000
>>>>>>>                 Shuffled Maps =851200
>>>>>>>                 Failed Shuffles=61
>>>>>>>                 Merged Map outputs=851200
>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>         Shuffle Errors
>>>>>>>                 BAD_ID=0
>>>>>>>                 CONNECTION=0
>>>>>>>                 IO_ERROR=4
>>>>>>>                 WRONG_LENGTH=0
>>>>>>>                 WRONG_MAP=0
>>>>>>>                 WRONG_REDUCE=0
>>>>>>>         File Input Format Counters
>>>>>>>                 Bytes Read=1000000000000
>>>>>>>         File Output Format Counters
>>>>>>>                 Bytes Written=1000000000000
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>>> MR1. I cannot really believe it.
>>>>>>>>
>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>> cluster configurations are as follows.
>>>>>>>>
>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>
>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>
>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>
>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>
>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>>>> "64";
>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>
>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>>>> it took around 90 minutes in MR2.
>>>>>>>>
>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sam,
>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>> to be made.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>
>>>>>>>>> Mike Segel
>>>>>>>>>
>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Harsh,
>>>>>>>>>
>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>
>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>> execute a job?
>>>>>>>>>
>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>>> container instead, right?
>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>> default size of physical mem of a container?
>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>
>>>>>>>>> Thanks as always!
>>>>>>>>>
>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>
>>>>>>>>>> Hi Sam,
>>>>>>>>>>
>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>
>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>> Map
>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>> that
>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>
>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>> assigned to containers?
>>>>>>>>>>
>>>>>>>>>> This is a general question. You may use the same process you took
>>>>>>>>>> to
>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>> there
>>>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>> change), or
>>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>>> change).
>>>>>>>>>>
>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>
>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>> specific
>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>> config
>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>
>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> > Hi Harsh,
>>>>>>>>>> >
>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>> setting, and
>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>> then, the
>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>> >
>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>> containers due
>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>> assigned to
>>>>>>>>>> > containers?
>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>> be 'yarn', will
>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>> framework? Like
>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>> >
>>>>>>>>>> > Thanks!
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>> >>
>>>>>>>>>> >> Hey Sam,
>>>>>>>>>> >>
>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>> config
>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>> same for
>>>>>>>>>> >> both test runs.
>>>>>>>>>> >>
>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>> >> >
>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>> about what's
>>>>>>>>>> >> > causing the difference.
>>>>>>>>>> >> >
>>>>>>>>>> >> > A couple observations:
>>>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>>>> in there
>>>>>>>>>> >> > twice
>>>>>>>>>> >> > with two different values.
>>>>>>>>>> >> >
>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>> the default
>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>> tasks getting
>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>> >> >
>>>>>>>>>> >> > -Sandy
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>> 5.5 mins from
>>>>>>>>>> >> >> 33%
>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>> Thanks!
>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>> >> >> </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>> >> >>    </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>
>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>> manager and
>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>>> Resource
>>>>>>>>>> >> >> Manager.
>>>>>>>>>> >> >>     </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>> application.</description>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>> resourcemanager and port
>>>>>>>>>> >> >> is
>>>>>>>>>> >> >> the port
>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>> Resource
>>>>>>>>>> >> >> Manager.
>>>>>>>>>> >> >>     </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>> ResourceManager and
>>>>>>>>>> >> >> the
>>>>>>>>>> >> >> port is the port on
>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>> </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>> port</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>>> >> >> GB</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>> logs are moved
>>>>>>>>>> >> >> to
>>>>>>>>>> >> >> </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>    <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>>>> >> >> directories</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>> Map Reduce to
>>>>>>>>>> >> >> run </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>> memory resource
>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>> minimum by
>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to
>>>>>>>>>> 8 GB NM
>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>> configs as at
>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>>> and to not
>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>> >> >>> wrote:
>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>>>> MRv1 and
>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>> >> >>> > But
>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>> comparison I did not
>>>>>>>>>> >> >>> > tune
>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>> results. After
>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>>> comparison
>>>>>>>>>> >> >>> > again.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>>>> compare
>>>>>>>>>> >> >>> > with
>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>>> Reduce
>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>>> performance
>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>> the near
>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>> >> >>> >>> and
>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>> similar hardware:
>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>> set on the
>>>>>>>>>> >> >>> >>> same
>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>>> MRv1, if
>>>>>>>>>> >> >>> >>> they
>>>>>>>>>> >> >>> >>> all
>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>> framework than
>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>>> be that Yarn
>>>>>>>>>> >> >>> >>> is
>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>>> Reduce
>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>> terasort?
>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>>> any
>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> --
>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> --
>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Harsh J
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Harsh J
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Thanks Sandy.

io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
mapreduce.job.reduce.slowstart.completedmaps in MR2 and
mapred.reduce.slowstart.completed.maps
in MR1 both use the default value 0.05.

I tried to allocate 1536MB and 1024MB to map container some time ago, but
the changes did not give me a better result, thus, I changed it back to
768MB.

Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
BTW, I should use
mapreduce.job.reduce.slowstart.completedmaps in MR2, right?

Also, in MR1 I can specify tasktracker.http.threads, but I could not find
the counterpart for MR2. Which one I should tune for the http thread?

Thanks again.


On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
> three times as long in MR2 as they are in MR1.  This is probably because
> you allow twice as many map tasks to run at a time in MR2 (12288/768 =
> 16).  Being able to use all the containers isn't necessarily a good thing
> if you are oversubscribing your node's resources.  Because of the different
> way that MR1 and MR2 view resources, I think it's better to test with
> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
> phases will run separately.
>
> On the other side, it looks like your MR1 has more spilled records than
> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
> to .13, which should improve MR1 performance, but will provide a fairer
> comparison (MR2 automatically does this tuning for you).
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <ji...@gmail.com>wrote:
>
>> The number of map slots and reduce slots on each data node for MR1 are 8
>> and 3, respectively. Since MR2 could use all containers for either map or
>> reduce, I would expect that MR2 is faster.
>>
>>
>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> How many map and reduce slots are you using per tasktracker in MR1?  How
>>> do the average map times compare? (MR2 reports this directly on the web UI,
>>> but you can also get a sense in MR1 by scrolling through the map tasks
>>> page).  Can you share the counters for MR1?
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>> to 2 times slowness. There should be something very wrong either in
>>>> configuration or code. Any hints?
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>
>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>
>>>>>
>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>> No lease on
>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>> File does not exist. Holder
>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>> any open files.
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>         at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>> --
>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>         at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>         at
>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>         at
>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>> java.nio.channels.ClosedChannelException
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>         at
>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>         at
>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>         at
>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>
>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>>>> Counters: 46
>>>>>>>         File System Counters
>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>                 FILE: Number of read operations=0
>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>                 FILE: Number of write operations=0
>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>         Job Counters
>>>>>>>                 Killed map tasks=1
>>>>>>>                 Killed reduce tasks=20
>>>>>>>                 Launched map tasks=7601
>>>>>>>                 Launched reduce tasks=132
>>>>>>>                 Data-local map tasks=7591
>>>>>>>                 Rack-local map tasks=10
>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>> (ms)=1696141311
>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>> (ms)=2664045096
>>>>>>>         Map-Reduce Framework
>>>>>>>                 Map input records=10000000000
>>>>>>>                 Map output records=10000000000
>>>>>>>                 Map output bytes=1020000000000
>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>                 Input split bytes=851200
>>>>>>>                 Combine input records=0
>>>>>>>                 Combine output records=0
>>>>>>>                 Reduce input groups=10000000000
>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>                 Reduce input records=10000000000
>>>>>>>                 Reduce output records=10000000000
>>>>>>>                 Spilled Records=20000000000
>>>>>>>                 Shuffled Maps =851200
>>>>>>>                 Failed Shuffles=61
>>>>>>>                 Merged Map outputs=851200
>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>         Shuffle Errors
>>>>>>>                 BAD_ID=0
>>>>>>>                 CONNECTION=0
>>>>>>>                 IO_ERROR=4
>>>>>>>                 WRONG_LENGTH=0
>>>>>>>                 WRONG_MAP=0
>>>>>>>                 WRONG_REDUCE=0
>>>>>>>         File Input Format Counters
>>>>>>>                 Bytes Read=1000000000000
>>>>>>>         File Output Format Counters
>>>>>>>                 Bytes Written=1000000000000
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>>> MR1. I cannot really believe it.
>>>>>>>>
>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>> cluster configurations are as follows.
>>>>>>>>
>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>
>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>
>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>
>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>
>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>>>> "64";
>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>
>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>>>> it took around 90 minutes in MR2.
>>>>>>>>
>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sam,
>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>> to be made.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>
>>>>>>>>> Mike Segel
>>>>>>>>>
>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Harsh,
>>>>>>>>>
>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>
>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>> execute a job?
>>>>>>>>>
>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>>> container instead, right?
>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>> default size of physical mem of a container?
>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>
>>>>>>>>> Thanks as always!
>>>>>>>>>
>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>
>>>>>>>>>> Hi Sam,
>>>>>>>>>>
>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>
>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>> Map
>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>> that
>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>
>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>> assigned to containers?
>>>>>>>>>>
>>>>>>>>>> This is a general question. You may use the same process you took
>>>>>>>>>> to
>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>> there
>>>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>> change), or
>>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>>> change).
>>>>>>>>>>
>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>
>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>> specific
>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>> config
>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>
>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> > Hi Harsh,
>>>>>>>>>> >
>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>> setting, and
>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>> then, the
>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>> >
>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>> containers due
>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>> assigned to
>>>>>>>>>> > containers?
>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>> be 'yarn', will
>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>> framework? Like
>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>> >
>>>>>>>>>> > Thanks!
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>> >>
>>>>>>>>>> >> Hey Sam,
>>>>>>>>>> >>
>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>> config
>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>> same for
>>>>>>>>>> >> both test runs.
>>>>>>>>>> >>
>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>> >> >
>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>> about what's
>>>>>>>>>> >> > causing the difference.
>>>>>>>>>> >> >
>>>>>>>>>> >> > A couple observations:
>>>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>>>> in there
>>>>>>>>>> >> > twice
>>>>>>>>>> >> > with two different values.
>>>>>>>>>> >> >
>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>> the default
>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>> tasks getting
>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>> >> >
>>>>>>>>>> >> > -Sandy
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>> 5.5 mins from
>>>>>>>>>> >> >> 33%
>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>> Thanks!
>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>> >> >> </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>> >> >>    </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>
>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>> manager and
>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>>> Resource
>>>>>>>>>> >> >> Manager.
>>>>>>>>>> >> >>     </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>> application.</description>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>> resourcemanager and port
>>>>>>>>>> >> >> is
>>>>>>>>>> >> >> the port
>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>> Resource
>>>>>>>>>> >> >> Manager.
>>>>>>>>>> >> >>     </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>> ResourceManager and
>>>>>>>>>> >> >> the
>>>>>>>>>> >> >> port is the port on
>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>> </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>> port</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>>> >> >> GB</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>> logs are moved
>>>>>>>>>> >> >> to
>>>>>>>>>> >> >> </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>    <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>>>> >> >> directories</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>> Map Reduce to
>>>>>>>>>> >> >> run </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>> memory resource
>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>> minimum by
>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to
>>>>>>>>>> 8 GB NM
>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>> configs as at
>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>>> and to not
>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>> >> >>> wrote:
>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>>>> MRv1 and
>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>> >> >>> > But
>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>> comparison I did not
>>>>>>>>>> >> >>> > tune
>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>> results. After
>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>>> comparison
>>>>>>>>>> >> >>> > again.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>>>> compare
>>>>>>>>>> >> >>> > with
>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>>> Reduce
>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>>> performance
>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>> the near
>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>> >> >>> >>> and
>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>> similar hardware:
>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>> set on the
>>>>>>>>>> >> >>> >>> same
>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>>> MRv1, if
>>>>>>>>>> >> >>> >>> they
>>>>>>>>>> >> >>> >>> all
>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>> framework than
>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>>> be that Yarn
>>>>>>>>>> >> >>> >>> is
>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>>> Reduce
>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>> terasort?
>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>>> any
>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> --
>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> --
>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Harsh J
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Harsh J
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Thanks Sandy.

io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
mapreduce.job.reduce.slowstart.completedmaps in MR2 and
mapred.reduce.slowstart.completed.maps
in MR1 both use the default value 0.05.

I tried to allocate 1536MB and 1024MB to map container some time ago, but
the changes did not give me a better result, thus, I changed it back to
768MB.

Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
BTW, I should use
mapreduce.job.reduce.slowstart.completedmaps in MR2, right?

Also, in MR1 I can specify tasktracker.http.threads, but I could not find
the counterpart for MR2. Which one I should tune for the http thread?

Thanks again.


On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
> three times as long in MR2 as they are in MR1.  This is probably because
> you allow twice as many map tasks to run at a time in MR2 (12288/768 =
> 16).  Being able to use all the containers isn't necessarily a good thing
> if you are oversubscribing your node's resources.  Because of the different
> way that MR1 and MR2 view resources, I think it's better to test with
> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
> phases will run separately.
>
> On the other side, it looks like your MR1 has more spilled records than
> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
> to .13, which should improve MR1 performance, but will provide a fairer
> comparison (MR2 automatically does this tuning for you).
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <ji...@gmail.com>wrote:
>
>> The number of map slots and reduce slots on each data node for MR1 are 8
>> and 3, respectively. Since MR2 could use all containers for either map or
>> reduce, I would expect that MR2 is faster.
>>
>>
>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> How many map and reduce slots are you using per tasktracker in MR1?  How
>>> do the average map times compare? (MR2 reports this directly on the web UI,
>>> but you can also get a sense in MR1 by scrolling through the map tasks
>>> page).  Can you share the counters for MR1?
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>> to 2 times slowness. There should be something very wrong either in
>>>> configuration or code. Any hints?
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>
>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>
>>>>>
>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>> No lease on
>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>> File does not exist. Holder
>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>> any open files.
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>         at
>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>> --
>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>         at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>         at
>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>         at
>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>> java.nio.channels.ClosedChannelException
>>>>>         at
>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>         at
>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>         at
>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>         at
>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>         at
>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>         at
>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>>
>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>
>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>>>> Counters: 46
>>>>>>>         File System Counters
>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>                 FILE: Number of read operations=0
>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>                 FILE: Number of write operations=0
>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>         Job Counters
>>>>>>>                 Killed map tasks=1
>>>>>>>                 Killed reduce tasks=20
>>>>>>>                 Launched map tasks=7601
>>>>>>>                 Launched reduce tasks=132
>>>>>>>                 Data-local map tasks=7591
>>>>>>>                 Rack-local map tasks=10
>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>> (ms)=1696141311
>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>> (ms)=2664045096
>>>>>>>         Map-Reduce Framework
>>>>>>>                 Map input records=10000000000
>>>>>>>                 Map output records=10000000000
>>>>>>>                 Map output bytes=1020000000000
>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>                 Input split bytes=851200
>>>>>>>                 Combine input records=0
>>>>>>>                 Combine output records=0
>>>>>>>                 Reduce input groups=10000000000
>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>                 Reduce input records=10000000000
>>>>>>>                 Reduce output records=10000000000
>>>>>>>                 Spilled Records=20000000000
>>>>>>>                 Shuffled Maps =851200
>>>>>>>                 Failed Shuffles=61
>>>>>>>                 Merged Map outputs=851200
>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>         Shuffle Errors
>>>>>>>                 BAD_ID=0
>>>>>>>                 CONNECTION=0
>>>>>>>                 IO_ERROR=4
>>>>>>>                 WRONG_LENGTH=0
>>>>>>>                 WRONG_MAP=0
>>>>>>>                 WRONG_REDUCE=0
>>>>>>>         File Input Format Counters
>>>>>>>                 Bytes Read=1000000000000
>>>>>>>         File Output Format Counters
>>>>>>>                 Bytes Written=1000000000000
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>>> MR1. I cannot really believe it.
>>>>>>>>
>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>> cluster configurations are as follows.
>>>>>>>>
>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>
>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>
>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>
>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>
>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>>>> "64";
>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>
>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>>>> it took around 90 minutes in MR2.
>>>>>>>>
>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sam,
>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>> to be made.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>
>>>>>>>>> Mike Segel
>>>>>>>>>
>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Harsh,
>>>>>>>>>
>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>
>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>> execute a job?
>>>>>>>>>
>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>>> container instead, right?
>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>> default size of physical mem of a container?
>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>
>>>>>>>>> Thanks as always!
>>>>>>>>>
>>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>>
>>>>>>>>>> Hi Sam,
>>>>>>>>>>
>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>
>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>> Map
>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>> that
>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>
>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>> assigned to containers?
>>>>>>>>>>
>>>>>>>>>> This is a general question. You may use the same process you took
>>>>>>>>>> to
>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>> there
>>>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>> change), or
>>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>>> change).
>>>>>>>>>>
>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>
>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>> specific
>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>> config
>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>
>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> > Hi Harsh,
>>>>>>>>>> >
>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>> setting, and
>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>> then, the
>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>> >
>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>> containers due
>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>>> assigned to
>>>>>>>>>> > containers?
>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>> be 'yarn', will
>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>> framework? Like
>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>> >
>>>>>>>>>> > Thanks!
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>>> >>
>>>>>>>>>> >> Hey Sam,
>>>>>>>>>> >>
>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>> config
>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>> same for
>>>>>>>>>> >> both test runs.
>>>>>>>>>> >>
>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>> >> >
>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>> about what's
>>>>>>>>>> >> > causing the difference.
>>>>>>>>>> >> >
>>>>>>>>>> >> > A couple observations:
>>>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>>>> in there
>>>>>>>>>> >> > twice
>>>>>>>>>> >> > with two different values.
>>>>>>>>>> >> >
>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>> the default
>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>> tasks getting
>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>> >> >
>>>>>>>>>> >> > -Sandy
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>> 5.5 mins from
>>>>>>>>>> >> >> 33%
>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>> Thanks!
>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>> >> >> </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>> >> >>    </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>
>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>> manager and
>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>>> Resource
>>>>>>>>>> >> >> Manager.
>>>>>>>>>> >> >>     </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>> application.</description>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>> resourcemanager and port
>>>>>>>>>> >> >> is
>>>>>>>>>> >> >> the port
>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>> Resource
>>>>>>>>>> >> >> Manager.
>>>>>>>>>> >> >>     </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>> ResourceManager and
>>>>>>>>>> >> >> the
>>>>>>>>>> >> >> port is the port on
>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>> </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>> port</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>>> >> >> GB</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>> logs are moved
>>>>>>>>>> >> >> to
>>>>>>>>>> >> >> </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>    <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>> >> >>
>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>>>> >> >> directories</description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>> Map Reduce to
>>>>>>>>>> >> >> run </description>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>   <property>
>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>  <property>
>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>> >> >>   </property>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>> memory resource
>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>> minimum by
>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to
>>>>>>>>>> 8 GB NM
>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>> configs as at
>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>>> and to not
>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>> >> >>> wrote:
>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>>>> MRv1 and
>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>> >> >>> > But
>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>> comparison I did not
>>>>>>>>>> >> >>> > tune
>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>> results. After
>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>>> comparison
>>>>>>>>>> >> >>> > again.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>>>> compare
>>>>>>>>>> >> >>> > with
>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>>> Reduce
>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>>> performance
>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>> the near
>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>> >> >>> >>> and
>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>> similar hardware:
>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>> set on the
>>>>>>>>>> >> >>> >>> same
>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>>> MRv1, if
>>>>>>>>>> >> >>> >>> they
>>>>>>>>>> >> >>> >>> all
>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>> framework than
>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>>> be that Yarn
>>>>>>>>>> >> >>> >>> is
>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>>> Reduce
>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>> terasort?
>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>>> any
>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> --
>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> --
>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Harsh J
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Harsh J
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
three times as long in MR2 as they are in MR1.  This is probably because
you allow twice as many map tasks to run at a time in MR2 (12288/768 = 16).
 Being able to use all the containers isn't necessarily a good thing if you
are oversubscribing your node's resources.  Because of the different way
that MR1 and MR2 view resources, I think it's better to test with
mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
phases will run separately.

On the other side, it looks like your MR1 has more spilled records than
MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
to .13, which should improve MR1 performance, but will provide a fairer
comparison (MR2 automatically does this tuning for you).

-Sandy


On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <ji...@gmail.com>wrote:

> The number of map slots and reduce slots on each data node for MR1 are 8
> and 3, respectively. Since MR2 could use all containers for either map or
> reduce, I would expect that MR2 is faster.
>
>
> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> How many map and reduce slots are you using per tasktracker in MR1?  How
>> do the average map times compare? (MR2 reports this directly on the web UI,
>> but you can also get a sense in MR1 by scrolling through the map tasks
>> page).  Can you share the counters for MR1?
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>> to 2 times slowness. There should be something very wrong either in
>>> configuration or code. Any hints?
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>
>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>
>>>>
>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>> No lease on
>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>> File does not exist. Holder
>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>> any open files.
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>         at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>> --
>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>         at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>         at
>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>         at
>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>         at
>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>>> Exception running child : java.nio.channels.ClosedChannelException
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>         at
>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>         at
>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>         at
>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>         at
>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>         at
>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>>> MR1 with JVM reuse turned off.
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> The Terasort output for MR2 is as follows.
>>>>>>
>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>>> Counters: 46
>>>>>>         File System Counters
>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>                 FILE: Number of read operations=0
>>>>>>                 FILE: Number of large read operations=0
>>>>>>                 FILE: Number of write operations=0
>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>                 HDFS: Number of read operations=32131
>>>>>>                 HDFS: Number of large read operations=0
>>>>>>                 HDFS: Number of write operations=224
>>>>>>         Job Counters
>>>>>>                 Killed map tasks=1
>>>>>>                 Killed reduce tasks=20
>>>>>>                 Launched map tasks=7601
>>>>>>                 Launched reduce tasks=132
>>>>>>                 Data-local map tasks=7591
>>>>>>                 Rack-local map tasks=10
>>>>>>                 Total time spent by all maps in occupied slots
>>>>>> (ms)=1696141311
>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>> (ms)=2664045096
>>>>>>         Map-Reduce Framework
>>>>>>                 Map input records=10000000000
>>>>>>                 Map output records=10000000000
>>>>>>                 Map output bytes=1020000000000
>>>>>>                 Map output materialized bytes=440486356802
>>>>>>                 Input split bytes=851200
>>>>>>                 Combine input records=0
>>>>>>                 Combine output records=0
>>>>>>                 Reduce input groups=10000000000
>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>                 Reduce input records=10000000000
>>>>>>                 Reduce output records=10000000000
>>>>>>                 Spilled Records=20000000000
>>>>>>                 Shuffled Maps =851200
>>>>>>                 Failed Shuffles=61
>>>>>>                 Merged Map outputs=851200
>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>                 CPU time spent (ms)=192433000
>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>         Shuffle Errors
>>>>>>                 BAD_ID=0
>>>>>>                 CONNECTION=0
>>>>>>                 IO_ERROR=4
>>>>>>                 WRONG_LENGTH=0
>>>>>>                 WRONG_MAP=0
>>>>>>                 WRONG_REDUCE=0
>>>>>>         File Input Format Counters
>>>>>>                 Bytes Read=1000000000000
>>>>>>         File Output Format Counters
>>>>>>                 Bytes Written=1000000000000
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>> MR1. I cannot really believe it.
>>>>>>>
>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>> cluster configurations are as follows.
>>>>>>>
>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>
>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>
>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>
>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>
>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>>> "64";
>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>
>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>>> it took around 90 minutes in MR2.
>>>>>>>
>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>
>>>>>>>> Sam,
>>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>>> be made.
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>
>>>>>>>> Mike Segel
>>>>>>>>
>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Harsh,
>>>>>>>>
>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>
>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?
>>>>>>>>
>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?
>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>> default size of physical mem of a container?
>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>
>>>>>>>> Thanks as always!
>>>>>>>>
>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>
>>>>>>>>> Hi Sam,
>>>>>>>>>
>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>
>>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>
>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>> assigned to containers?
>>>>>>>>>
>>>>>>>>> This is a general question. You may use the same process you took
>>>>>>>>> to
>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>> there
>>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>>> or
>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>> change).
>>>>>>>>>
>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>
>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>> specific
>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>> config
>>>>>>>>> name) will not apply anymore.
>>>>>>>>>
>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> > Hi Harsh,
>>>>>>>>> >
>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>> setting, and
>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>> then, the
>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>> >
>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>> containers due
>>>>>>>>> > to a 22 GB memory?
>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>> assigned to
>>>>>>>>> > containers?
>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>>> 'yarn', will
>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>> framework? Like
>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>> >
>>>>>>>>> > Thanks!
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>> >>
>>>>>>>>> >> Hey Sam,
>>>>>>>>> >>
>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>> config
>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>> same for
>>>>>>>>> >> both test runs.
>>>>>>>>> >>
>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >> > Hey Sam,
>>>>>>>>> >> >
>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>> about what's
>>>>>>>>> >> > causing the difference.
>>>>>>>>> >> >
>>>>>>>>> >> > A couple observations:
>>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>>> in there
>>>>>>>>> >> > twice
>>>>>>>>> >> > with two different values.
>>>>>>>>> >> >
>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>>> default
>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>> tasks getting
>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>> >> >
>>>>>>>>> >> > -Sandy
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>>> mins from
>>>>>>>>> >> >> 33%
>>>>>>>>> >> >> to 35% as below.
>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>> >> >>
>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>> Thanks!
>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>> >> >>
>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>> >> >> </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>> >> >>    </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>
>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>> manager and
>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>> Resource
>>>>>>>>> >> >> Manager.
>>>>>>>>> >> >>     </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>> application.</description>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>>> and port
>>>>>>>>> >> >> is
>>>>>>>>> >> >> the port
>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>> Resource
>>>>>>>>> >> >> Manager.
>>>>>>>>> >> >>     </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>> ResourceManager and
>>>>>>>>> >> >> the
>>>>>>>>> >> >> port is the port on
>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>> </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>> port</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>> >> >> GB</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>> logs are moved
>>>>>>>>> >> >> to
>>>>>>>>> >> >> </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>    <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>>> >> >> directories</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>> Map Reduce to
>>>>>>>>> >> >> run </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>>> resource
>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>> minimum by
>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to
>>>>>>>>> 8 GB NM
>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>> configs as at
>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>> and to not
>>>>>>>>> >> >>> care about such things :)
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>> >> >>> wrote:
>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>>> MRv1 and
>>>>>>>>> >> >>> > Yarn.
>>>>>>>>> >> >>> > But
>>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>>> I did not
>>>>>>>>> >> >>> > tune
>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>> results. After
>>>>>>>>> >> >>> > analyzing
>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>> comparison
>>>>>>>>> >> >>> > again.
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>>> compare
>>>>>>>>> >> >>> > with
>>>>>>>>> >> >>> > MRv1,
>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>> Reduce
>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>> performance
>>>>>>>>> >> >>> > tunning?
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > Thanks!
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>>> near
>>>>>>>>> >> >>> >>> future,
>>>>>>>>> >> >>> >>> and
>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>> similar hardware:
>>>>>>>>> >> >>> >>> 2
>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>>> on the
>>>>>>>>> >> >>> >>> same
>>>>>>>>> >> >>> >>> env. To
>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>> MRv1, if
>>>>>>>>> >> >>> >>> they
>>>>>>>>> >> >>> >>> all
>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>> framework than
>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>> be that Yarn
>>>>>>>>> >> >>> >>> is
>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>> Reduce
>>>>>>>>> >> >>> >>> phase.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>> terasort?
>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>> any
>>>>>>>>> >> >>> >>> materials?
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> --
>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> --
>>>>>>>>> >> >>> Harsh J
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Harsh J
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
three times as long in MR2 as they are in MR1.  This is probably because
you allow twice as many map tasks to run at a time in MR2 (12288/768 = 16).
 Being able to use all the containers isn't necessarily a good thing if you
are oversubscribing your node's resources.  Because of the different way
that MR1 and MR2 view resources, I think it's better to test with
mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
phases will run separately.

On the other side, it looks like your MR1 has more spilled records than
MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
to .13, which should improve MR1 performance, but will provide a fairer
comparison (MR2 automatically does this tuning for you).

-Sandy


On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <ji...@gmail.com>wrote:

> The number of map slots and reduce slots on each data node for MR1 are 8
> and 3, respectively. Since MR2 could use all containers for either map or
> reduce, I would expect that MR2 is faster.
>
>
> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> How many map and reduce slots are you using per tasktracker in MR1?  How
>> do the average map times compare? (MR2 reports this directly on the web UI,
>> but you can also get a sense in MR1 by scrolling through the map tasks
>> page).  Can you share the counters for MR1?
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>> to 2 times slowness. There should be something very wrong either in
>>> configuration or code. Any hints?
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>
>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>
>>>>
>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>> No lease on
>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>> File does not exist. Holder
>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>> any open files.
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>         at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>> --
>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>         at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>         at
>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>         at
>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>         at
>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>>> Exception running child : java.nio.channels.ClosedChannelException
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>         at
>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>         at
>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>         at
>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>         at
>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>         at
>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>>> MR1 with JVM reuse turned off.
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> The Terasort output for MR2 is as follows.
>>>>>>
>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>>> Counters: 46
>>>>>>         File System Counters
>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>                 FILE: Number of read operations=0
>>>>>>                 FILE: Number of large read operations=0
>>>>>>                 FILE: Number of write operations=0
>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>                 HDFS: Number of read operations=32131
>>>>>>                 HDFS: Number of large read operations=0
>>>>>>                 HDFS: Number of write operations=224
>>>>>>         Job Counters
>>>>>>                 Killed map tasks=1
>>>>>>                 Killed reduce tasks=20
>>>>>>                 Launched map tasks=7601
>>>>>>                 Launched reduce tasks=132
>>>>>>                 Data-local map tasks=7591
>>>>>>                 Rack-local map tasks=10
>>>>>>                 Total time spent by all maps in occupied slots
>>>>>> (ms)=1696141311
>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>> (ms)=2664045096
>>>>>>         Map-Reduce Framework
>>>>>>                 Map input records=10000000000
>>>>>>                 Map output records=10000000000
>>>>>>                 Map output bytes=1020000000000
>>>>>>                 Map output materialized bytes=440486356802
>>>>>>                 Input split bytes=851200
>>>>>>                 Combine input records=0
>>>>>>                 Combine output records=0
>>>>>>                 Reduce input groups=10000000000
>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>                 Reduce input records=10000000000
>>>>>>                 Reduce output records=10000000000
>>>>>>                 Spilled Records=20000000000
>>>>>>                 Shuffled Maps =851200
>>>>>>                 Failed Shuffles=61
>>>>>>                 Merged Map outputs=851200
>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>                 CPU time spent (ms)=192433000
>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>         Shuffle Errors
>>>>>>                 BAD_ID=0
>>>>>>                 CONNECTION=0
>>>>>>                 IO_ERROR=4
>>>>>>                 WRONG_LENGTH=0
>>>>>>                 WRONG_MAP=0
>>>>>>                 WRONG_REDUCE=0
>>>>>>         File Input Format Counters
>>>>>>                 Bytes Read=1000000000000
>>>>>>         File Output Format Counters
>>>>>>                 Bytes Written=1000000000000
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>> MR1. I cannot really believe it.
>>>>>>>
>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>> cluster configurations are as follows.
>>>>>>>
>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>
>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>
>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>
>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>
>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>>> "64";
>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>
>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>>> it took around 90 minutes in MR2.
>>>>>>>
>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>
>>>>>>>> Sam,
>>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>>> be made.
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>
>>>>>>>> Mike Segel
>>>>>>>>
>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Harsh,
>>>>>>>>
>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>
>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?
>>>>>>>>
>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?
>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>> default size of physical mem of a container?
>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>
>>>>>>>> Thanks as always!
>>>>>>>>
>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>
>>>>>>>>> Hi Sam,
>>>>>>>>>
>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>
>>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>
>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>> assigned to containers?
>>>>>>>>>
>>>>>>>>> This is a general question. You may use the same process you took
>>>>>>>>> to
>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>> there
>>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>>> or
>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>> change).
>>>>>>>>>
>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>
>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>> specific
>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>> config
>>>>>>>>> name) will not apply anymore.
>>>>>>>>>
>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> > Hi Harsh,
>>>>>>>>> >
>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>> setting, and
>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>> then, the
>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>> >
>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>> containers due
>>>>>>>>> > to a 22 GB memory?
>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>> assigned to
>>>>>>>>> > containers?
>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>>> 'yarn', will
>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>> framework? Like
>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>> >
>>>>>>>>> > Thanks!
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>> >>
>>>>>>>>> >> Hey Sam,
>>>>>>>>> >>
>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>> config
>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>> same for
>>>>>>>>> >> both test runs.
>>>>>>>>> >>
>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >> > Hey Sam,
>>>>>>>>> >> >
>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>> about what's
>>>>>>>>> >> > causing the difference.
>>>>>>>>> >> >
>>>>>>>>> >> > A couple observations:
>>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>>> in there
>>>>>>>>> >> > twice
>>>>>>>>> >> > with two different values.
>>>>>>>>> >> >
>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>>> default
>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>> tasks getting
>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>> >> >
>>>>>>>>> >> > -Sandy
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>>> mins from
>>>>>>>>> >> >> 33%
>>>>>>>>> >> >> to 35% as below.
>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>> >> >>
>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>> Thanks!
>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>> >> >>
>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>> >> >> </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>> >> >>    </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>
>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>> manager and
>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>> Resource
>>>>>>>>> >> >> Manager.
>>>>>>>>> >> >>     </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>> application.</description>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>>> and port
>>>>>>>>> >> >> is
>>>>>>>>> >> >> the port
>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>> Resource
>>>>>>>>> >> >> Manager.
>>>>>>>>> >> >>     </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>> ResourceManager and
>>>>>>>>> >> >> the
>>>>>>>>> >> >> port is the port on
>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>> </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>> port</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>> >> >> GB</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>> logs are moved
>>>>>>>>> >> >> to
>>>>>>>>> >> >> </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>    <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>>> >> >> directories</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>> Map Reduce to
>>>>>>>>> >> >> run </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>>> resource
>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>> minimum by
>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to
>>>>>>>>> 8 GB NM
>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>> configs as at
>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>> and to not
>>>>>>>>> >> >>> care about such things :)
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>> >> >>> wrote:
>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>>> MRv1 and
>>>>>>>>> >> >>> > Yarn.
>>>>>>>>> >> >>> > But
>>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>>> I did not
>>>>>>>>> >> >>> > tune
>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>> results. After
>>>>>>>>> >> >>> > analyzing
>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>> comparison
>>>>>>>>> >> >>> > again.
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>>> compare
>>>>>>>>> >> >>> > with
>>>>>>>>> >> >>> > MRv1,
>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>> Reduce
>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>> performance
>>>>>>>>> >> >>> > tunning?
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > Thanks!
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>>> near
>>>>>>>>> >> >>> >>> future,
>>>>>>>>> >> >>> >>> and
>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>> similar hardware:
>>>>>>>>> >> >>> >>> 2
>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>>> on the
>>>>>>>>> >> >>> >>> same
>>>>>>>>> >> >>> >>> env. To
>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>> MRv1, if
>>>>>>>>> >> >>> >>> they
>>>>>>>>> >> >>> >>> all
>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>> framework than
>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>> be that Yarn
>>>>>>>>> >> >>> >>> is
>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>> Reduce
>>>>>>>>> >> >>> >>> phase.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>> terasort?
>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>> any
>>>>>>>>> >> >>> >>> materials?
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> --
>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> --
>>>>>>>>> >> >>> Harsh J
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Harsh J
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
three times as long in MR2 as they are in MR1.  This is probably because
you allow twice as many map tasks to run at a time in MR2 (12288/768 = 16).
 Being able to use all the containers isn't necessarily a good thing if you
are oversubscribing your node's resources.  Because of the different way
that MR1 and MR2 view resources, I think it's better to test with
mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
phases will run separately.

On the other side, it looks like your MR1 has more spilled records than
MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
to .13, which should improve MR1 performance, but will provide a fairer
comparison (MR2 automatically does this tuning for you).

-Sandy


On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <ji...@gmail.com>wrote:

> The number of map slots and reduce slots on each data node for MR1 are 8
> and 3, respectively. Since MR2 could use all containers for either map or
> reduce, I would expect that MR2 is faster.
>
>
> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> How many map and reduce slots are you using per tasktracker in MR1?  How
>> do the average map times compare? (MR2 reports this directly on the web UI,
>> but you can also get a sense in MR1 by scrolling through the map tasks
>> page).  Can you share the counters for MR1?
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>> to 2 times slowness. There should be something very wrong either in
>>> configuration or code. Any hints?
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>
>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>
>>>>
>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>> No lease on
>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>> File does not exist. Holder
>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>> any open files.
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>         at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>> --
>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>         at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>         at
>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>         at
>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>         at
>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>>> Exception running child : java.nio.channels.ClosedChannelException
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>         at
>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>         at
>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>         at
>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>         at
>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>         at
>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>>> MR1 with JVM reuse turned off.
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> The Terasort output for MR2 is as follows.
>>>>>>
>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>>> Counters: 46
>>>>>>         File System Counters
>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>                 FILE: Number of read operations=0
>>>>>>                 FILE: Number of large read operations=0
>>>>>>                 FILE: Number of write operations=0
>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>                 HDFS: Number of read operations=32131
>>>>>>                 HDFS: Number of large read operations=0
>>>>>>                 HDFS: Number of write operations=224
>>>>>>         Job Counters
>>>>>>                 Killed map tasks=1
>>>>>>                 Killed reduce tasks=20
>>>>>>                 Launched map tasks=7601
>>>>>>                 Launched reduce tasks=132
>>>>>>                 Data-local map tasks=7591
>>>>>>                 Rack-local map tasks=10
>>>>>>                 Total time spent by all maps in occupied slots
>>>>>> (ms)=1696141311
>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>> (ms)=2664045096
>>>>>>         Map-Reduce Framework
>>>>>>                 Map input records=10000000000
>>>>>>                 Map output records=10000000000
>>>>>>                 Map output bytes=1020000000000
>>>>>>                 Map output materialized bytes=440486356802
>>>>>>                 Input split bytes=851200
>>>>>>                 Combine input records=0
>>>>>>                 Combine output records=0
>>>>>>                 Reduce input groups=10000000000
>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>                 Reduce input records=10000000000
>>>>>>                 Reduce output records=10000000000
>>>>>>                 Spilled Records=20000000000
>>>>>>                 Shuffled Maps =851200
>>>>>>                 Failed Shuffles=61
>>>>>>                 Merged Map outputs=851200
>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>                 CPU time spent (ms)=192433000
>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>         Shuffle Errors
>>>>>>                 BAD_ID=0
>>>>>>                 CONNECTION=0
>>>>>>                 IO_ERROR=4
>>>>>>                 WRONG_LENGTH=0
>>>>>>                 WRONG_MAP=0
>>>>>>                 WRONG_REDUCE=0
>>>>>>         File Input Format Counters
>>>>>>                 Bytes Read=1000000000000
>>>>>>         File Output Format Counters
>>>>>>                 Bytes Written=1000000000000
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>> MR1. I cannot really believe it.
>>>>>>>
>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>> cluster configurations are as follows.
>>>>>>>
>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>
>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>
>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>
>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>
>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>>> "64";
>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>
>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>>> it took around 90 minutes in MR2.
>>>>>>>
>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>
>>>>>>>> Sam,
>>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>>> be made.
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>
>>>>>>>> Mike Segel
>>>>>>>>
>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Harsh,
>>>>>>>>
>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>
>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?
>>>>>>>>
>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?
>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>> default size of physical mem of a container?
>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>
>>>>>>>> Thanks as always!
>>>>>>>>
>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>
>>>>>>>>> Hi Sam,
>>>>>>>>>
>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>
>>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>
>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>> assigned to containers?
>>>>>>>>>
>>>>>>>>> This is a general question. You may use the same process you took
>>>>>>>>> to
>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>> there
>>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>>> or
>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>> change).
>>>>>>>>>
>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>
>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>> specific
>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>> config
>>>>>>>>> name) will not apply anymore.
>>>>>>>>>
>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> > Hi Harsh,
>>>>>>>>> >
>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>> setting, and
>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>> then, the
>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>> >
>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>> containers due
>>>>>>>>> > to a 22 GB memory?
>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>> assigned to
>>>>>>>>> > containers?
>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>>> 'yarn', will
>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>> framework? Like
>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>> >
>>>>>>>>> > Thanks!
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>> >>
>>>>>>>>> >> Hey Sam,
>>>>>>>>> >>
>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>> config
>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>> same for
>>>>>>>>> >> both test runs.
>>>>>>>>> >>
>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >> > Hey Sam,
>>>>>>>>> >> >
>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>> about what's
>>>>>>>>> >> > causing the difference.
>>>>>>>>> >> >
>>>>>>>>> >> > A couple observations:
>>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>>> in there
>>>>>>>>> >> > twice
>>>>>>>>> >> > with two different values.
>>>>>>>>> >> >
>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>>> default
>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>> tasks getting
>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>> >> >
>>>>>>>>> >> > -Sandy
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>>> mins from
>>>>>>>>> >> >> 33%
>>>>>>>>> >> >> to 35% as below.
>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>> >> >>
>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>> Thanks!
>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>> >> >>
>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>> >> >> </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>> >> >>    </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>
>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>> manager and
>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>> Resource
>>>>>>>>> >> >> Manager.
>>>>>>>>> >> >>     </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>> application.</description>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>>> and port
>>>>>>>>> >> >> is
>>>>>>>>> >> >> the port
>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>> Resource
>>>>>>>>> >> >> Manager.
>>>>>>>>> >> >>     </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>> ResourceManager and
>>>>>>>>> >> >> the
>>>>>>>>> >> >> port is the port on
>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>> </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>> port</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>> >> >> GB</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>> logs are moved
>>>>>>>>> >> >> to
>>>>>>>>> >> >> </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>    <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>>> >> >> directories</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>> Map Reduce to
>>>>>>>>> >> >> run </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>>> resource
>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>> minimum by
>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to
>>>>>>>>> 8 GB NM
>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>> configs as at
>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>> and to not
>>>>>>>>> >> >>> care about such things :)
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>> >> >>> wrote:
>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>>> MRv1 and
>>>>>>>>> >> >>> > Yarn.
>>>>>>>>> >> >>> > But
>>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>>> I did not
>>>>>>>>> >> >>> > tune
>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>> results. After
>>>>>>>>> >> >>> > analyzing
>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>> comparison
>>>>>>>>> >> >>> > again.
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>>> compare
>>>>>>>>> >> >>> > with
>>>>>>>>> >> >>> > MRv1,
>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>> Reduce
>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>> performance
>>>>>>>>> >> >>> > tunning?
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > Thanks!
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>>> near
>>>>>>>>> >> >>> >>> future,
>>>>>>>>> >> >>> >>> and
>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>> similar hardware:
>>>>>>>>> >> >>> >>> 2
>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>>> on the
>>>>>>>>> >> >>> >>> same
>>>>>>>>> >> >>> >>> env. To
>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>> MRv1, if
>>>>>>>>> >> >>> >>> they
>>>>>>>>> >> >>> >>> all
>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>> framework than
>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>> be that Yarn
>>>>>>>>> >> >>> >>> is
>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>> Reduce
>>>>>>>>> >> >>> >>> phase.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>> terasort?
>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>> any
>>>>>>>>> >> >>> >>> materials?
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> --
>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> --
>>>>>>>>> >> >>> Harsh J
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Harsh J
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking about
three times as long in MR2 as they are in MR1.  This is probably because
you allow twice as many map tasks to run at a time in MR2 (12288/768 = 16).
 Being able to use all the containers isn't necessarily a good thing if you
are oversubscribing your node's resources.  Because of the different way
that MR1 and MR2 view resources, I think it's better to test with
mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
phases will run separately.

On the other side, it looks like your MR1 has more spilled records than
MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
to .13, which should improve MR1 performance, but will provide a fairer
comparison (MR2 automatically does this tuning for you).

-Sandy


On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <ji...@gmail.com>wrote:

> The number of map slots and reduce slots on each data node for MR1 are 8
> and 3, respectively. Since MR2 could use all containers for either map or
> reduce, I would expect that MR2 is faster.
>
>
> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> How many map and reduce slots are you using per tasktracker in MR1?  How
>> do the average map times compare? (MR2 reports this directly on the web UI,
>> but you can also get a sense in MR1 by scrolling through the map tasks
>> page).  Can you share the counters for MR1?
>>
>> -Sandy
>>
>>
>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>> jian.fang.subscribe@gmail.com> wrote:
>>
>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>> to 2 times slowness. There should be something very wrong either in
>>> configuration or code. Any hints?
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>
>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>
>>>>
>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>> No lease on
>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>> File does not exist. Holder
>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>> any open files.
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>         at
>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>         at
>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>> --
>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>         at
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>         at
>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>         at
>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>         at
>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>>> Exception running child : java.nio.channels.ClosedChannelException
>>>>         at
>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>         at
>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>         at
>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>         at
>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>         at
>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>         at
>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>         at
>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>>
>>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>>> MR1 with JVM reuse turned off.
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> The Terasort output for MR2 is as follows.
>>>>>>
>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>>> Counters: 46
>>>>>>         File System Counters
>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>                 FILE: Number of read operations=0
>>>>>>                 FILE: Number of large read operations=0
>>>>>>                 FILE: Number of write operations=0
>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>                 HDFS: Number of read operations=32131
>>>>>>                 HDFS: Number of large read operations=0
>>>>>>                 HDFS: Number of write operations=224
>>>>>>         Job Counters
>>>>>>                 Killed map tasks=1
>>>>>>                 Killed reduce tasks=20
>>>>>>                 Launched map tasks=7601
>>>>>>                 Launched reduce tasks=132
>>>>>>                 Data-local map tasks=7591
>>>>>>                 Rack-local map tasks=10
>>>>>>                 Total time spent by all maps in occupied slots
>>>>>> (ms)=1696141311
>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>> (ms)=2664045096
>>>>>>         Map-Reduce Framework
>>>>>>                 Map input records=10000000000
>>>>>>                 Map output records=10000000000
>>>>>>                 Map output bytes=1020000000000
>>>>>>                 Map output materialized bytes=440486356802
>>>>>>                 Input split bytes=851200
>>>>>>                 Combine input records=0
>>>>>>                 Combine output records=0
>>>>>>                 Reduce input groups=10000000000
>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>                 Reduce input records=10000000000
>>>>>>                 Reduce output records=10000000000
>>>>>>                 Spilled Records=20000000000
>>>>>>                 Shuffled Maps =851200
>>>>>>                 Failed Shuffles=61
>>>>>>                 Merged Map outputs=851200
>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>                 CPU time spent (ms)=192433000
>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>         Shuffle Errors
>>>>>>                 BAD_ID=0
>>>>>>                 CONNECTION=0
>>>>>>                 IO_ERROR=4
>>>>>>                 WRONG_LENGTH=0
>>>>>>                 WRONG_MAP=0
>>>>>>                 WRONG_REDUCE=0
>>>>>>         File Input Format Counters
>>>>>>                 Bytes Read=1000000000000
>>>>>>         File Output Format Counters
>>>>>>                 Bytes Written=1000000000000
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>>> MR1. I cannot really believe it.
>>>>>>>
>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>> cluster configurations are as follows.
>>>>>>>
>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>
>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>
>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>
>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>
>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>>> "64";
>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>
>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>>> it took around 90 minutes in MR2.
>>>>>>>
>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>
>>>>>>>> Sam,
>>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>>> be made.
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>
>>>>>>>> Mike Segel
>>>>>>>>
>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Harsh,
>>>>>>>>
>>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>
>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?
>>>>>>>>
>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?
>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>> default size of physical mem of a container?
>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>
>>>>>>>> Thanks as always!
>>>>>>>>
>>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>>
>>>>>>>>> Hi Sam,
>>>>>>>>>
>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>
>>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>
>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>> assigned to containers?
>>>>>>>>>
>>>>>>>>> This is a general question. You may use the same process you took
>>>>>>>>> to
>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>> there
>>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>>> or
>>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>>> change).
>>>>>>>>>
>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>
>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>> specific
>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>> config
>>>>>>>>> name) will not apply anymore.
>>>>>>>>>
>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> > Hi Harsh,
>>>>>>>>> >
>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>> setting, and
>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>> then, the
>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>> >
>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>> containers due
>>>>>>>>> > to a 22 GB memory?
>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>>> assigned to
>>>>>>>>> > containers?
>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>>> 'yarn', will
>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>> framework? Like
>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>> >
>>>>>>>>> > Thanks!
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>>> >>
>>>>>>>>> >> Hey Sam,
>>>>>>>>> >>
>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>> config
>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>> same for
>>>>>>>>> >> both test runs.
>>>>>>>>> >>
>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >> > Hey Sam,
>>>>>>>>> >> >
>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>> about what's
>>>>>>>>> >> > causing the difference.
>>>>>>>>> >> >
>>>>>>>>> >> > A couple observations:
>>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>>> in there
>>>>>>>>> >> > twice
>>>>>>>>> >> > with two different values.
>>>>>>>>> >> >
>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>>> default
>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>> tasks getting
>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>> >> >
>>>>>>>>> >> > -Sandy
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>>> mins from
>>>>>>>>> >> >> 33%
>>>>>>>>> >> >> to 35% as below.
>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>> >> >>
>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>> Thanks!
>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>> >> >>
>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>> >> >> </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>> >> >>    </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>
>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>> manager and
>>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>>> Resource
>>>>>>>>> >> >> Manager.
>>>>>>>>> >> >>     </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>> application.</description>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>>> and port
>>>>>>>>> >> >> is
>>>>>>>>> >> >> the port
>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>> Resource
>>>>>>>>> >> >> Manager.
>>>>>>>>> >> >>     </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>> ResourceManager and
>>>>>>>>> >> >> the
>>>>>>>>> >> >> port is the port on
>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>> </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>> port</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>>> >> >> GB</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>> logs are moved
>>>>>>>>> >> >> to
>>>>>>>>> >> >> </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>    <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>> >> >>
>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>>> >> >> directories</description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>> Map Reduce to
>>>>>>>>> >> >> run </description>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>   <property>
>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >> <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>  <property>
>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>> >> >>   </property>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>>> resource
>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>> minimum by
>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to
>>>>>>>>> 8 GB NM
>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>> configs as at
>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>>> and to not
>>>>>>>>> >> >>> care about such things :)
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>> >> >>> wrote:
>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>>> MRv1 and
>>>>>>>>> >> >>> > Yarn.
>>>>>>>>> >> >>> > But
>>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>>> I did not
>>>>>>>>> >> >>> > tune
>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>> results. After
>>>>>>>>> >> >>> > analyzing
>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>>> comparison
>>>>>>>>> >> >>> > again.
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>>> compare
>>>>>>>>> >> >>> > with
>>>>>>>>> >> >>> > MRv1,
>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>>> Reduce
>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>>> performance
>>>>>>>>> >> >>> > tunning?
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > Thanks!
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>>> near
>>>>>>>>> >> >>> >>> future,
>>>>>>>>> >> >>> >>> and
>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>> similar hardware:
>>>>>>>>> >> >>> >>> 2
>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>>> on the
>>>>>>>>> >> >>> >>> same
>>>>>>>>> >> >>> >>> env. To
>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>>> MRv1, if
>>>>>>>>> >> >>> >>> they
>>>>>>>>> >> >>> >>> all
>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>> framework than
>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might
>>>>>>>>> be that Yarn
>>>>>>>>> >> >>> >>> is
>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>>> Reduce
>>>>>>>>> >> >>> >>> phase.
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>> terasort?
>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is
>>>>>>>>> any
>>>>>>>>> >> >>> >>> materials?
>>>>>>>>> >> >>> >>>
>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >> --
>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>> >> >>> >>
>>>>>>>>> >> >>> >
>>>>>>>>> >> >>>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> --
>>>>>>>>> >> >>> Harsh J
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Harsh J
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Harsh J
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

The number of map slots and reduce slots on each data node for MR1 are 8
and 3, respectively. Since MR2 could use all containers for either map or
reduce, I would expect that MR2 is faster.


On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> How many map and reduce slots are you using per tasktracker in MR1?  How
> do the average map times compare? (MR2 reports this directly on the web UI,
> but you can also get a sense in MR1 by scrolling through the map tasks
> page).  Can you share the counters for MR1?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>> to 2 times slowness. There should be something very wrong either in
>> configuration or code. Any hints?
>>
>>
>>
>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>
>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>
>>>
>>> 2013-10-20 03:13:58,751 ERROR [main]
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>> No lease on
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>> File does not exist. Holder
>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>> any open files.
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> --
>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>         at
>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.nio.channels.ClosedChannelException
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>         at
>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>         at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>         at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>> MR1 with JVM reuse turned off.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The Terasort output for MR2 is as follows.
>>>>>
>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=456102049355
>>>>>                 FILE: Number of bytes written=897246250517
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=32131
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=224
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=20
>>>>>                 Launched map tasks=7601
>>>>>                 Launched reduce tasks=132
>>>>>                 Data-local map tasks=7591
>>>>>                 Rack-local map tasks=10
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1696141311
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=2664045096
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440486356802
>>>>>                 Input split bytes=851200
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440486356802
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =851200
>>>>>                 Failed Shuffles=61
>>>>>                 Merged Map outputs=851200
>>>>>                 GC time elapsed (ms)=4215666
>>>>>                 CPU time spent (ms)=192433000
>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=4
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>> MR1. I cannot really believe it.
>>>>>>
>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>>> configurations are as follows.
>>>>>>
>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>
>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>
>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>         mapreduce.map.speculative = "true";
>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>
>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>         mapreduce.map.output.compress.codec =
>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>
>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>> "64";
>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>
>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>> it took around 90 minutes in MR2.
>>>>>>
>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>> configurations for terasort to work better in MR2?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>
>>>>>>> Sam,
>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>> be made.
>>>>>>>
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Harsh,
>>>>>>>
>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>> default size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>>
>>>>>>> Thanks as always!
>>>>>>>
>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>
>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to containers?
>>>>>>>>
>>>>>>>> This is a general question. You may use the same process you took to
>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>> there
>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>> or
>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>> change).
>>>>>>>>
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>
>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>> specific
>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>> config
>>>>>>>> name) will not apply anymore.
>>>>>>>>
>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Harsh,
>>>>>>>> >
>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>> setting, and
>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>> then, the
>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>> >
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due
>>>>>>>> > to a 22 GB memory?
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to
>>>>>>>> > containers?
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will
>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like
>>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>> >>
>>>>>>>> >> Hey Sam,
>>>>>>>> >>
>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>> config
>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>> same for
>>>>>>>> >> both test runs.
>>>>>>>> >>
>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > Hey Sam,
>>>>>>>> >> >
>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>>> what's
>>>>>>>> >> > causing the difference.
>>>>>>>> >> >
>>>>>>>> >> > A couple observations:
>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>> in there
>>>>>>>> >> > twice
>>>>>>>> >> > with two different values.
>>>>>>>> >> >
>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>> default
>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>> tasks getting
>>>>>>>> >> > killed for running over resource limits?
>>>>>>>> >> >
>>>>>>>> >> > -Sandy
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>> mins from
>>>>>>>> >> >> 33%
>>>>>>>> >> >> to 35% as below.
>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>> >> >>
>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>> Thanks!
>>>>>>>> >> >> (A) core-site.xml
>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>> >> >>
>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>> >> >>     <value>1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>> >> >>     <value>10</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>> >> >> </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>> >> >>    </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>> >> >>     <value>8</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>> >> >>     <value>4</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>> >> >>     <value>true</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>>> and
>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>> application.</description>
>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>> and port
>>>>>>>> >> >> is
>>>>>>>> >> >> the port
>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>> ResourceManager and
>>>>>>>> >> >> the
>>>>>>>> >> >> port is the port on
>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>> >> >> nodemanager</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>> port</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>10240</value>
>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>> >> >> GB</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>>> are moved
>>>>>>>> >> >> to
>>>>>>>> >> >> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>    <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>> >> >> directories</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>>> Reduce to
>>>>>>>> >> >> run </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>> >> >>     <value>24</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>> >> >>     <value>3</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>22000</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>> >> >>>
>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>> resource
>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>> minimum by
>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>>> GB NM
>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>> configs as at
>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>> and to not
>>>>>>>> >> >>> care about such things :)
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>> MRv1 and
>>>>>>>> >> >>> > Yarn.
>>>>>>>> >> >>> > But
>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>> I did not
>>>>>>>> >> >>> > tune
>>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>>> After
>>>>>>>> >> >>> > analyzing
>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>> comparison
>>>>>>>> >> >>> > again.
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>> compare
>>>>>>>> >> >>> > with
>>>>>>>> >> >>> > MRv1,
>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>> performance
>>>>>>>> >> >>> > tunning?
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Thanks!
>>>>>>>> >> >>> >
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>> near
>>>>>>>> >> >>> >>> future,
>>>>>>>> >> >>> >>> and
>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>>> hardware:
>>>>>>>> >> >>> >>> 2
>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>> on the
>>>>>>>> >> >>> >>> same
>>>>>>>> >> >>> >>> env. To
>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>> >> >>> >>> configurations, but
>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>> MRv1, if
>>>>>>>> >> >>> >>> they
>>>>>>>> >> >>> >>> all
>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>> framework than
>>>>>>>> >> >>> >>> MRv1.
>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>>> that Yarn
>>>>>>>> >> >>> >>> is
>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> >>> phase.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>> terasort?
>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>>> >> >>> >>> materials?
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Thanks!
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> --
>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Harsh J
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Harsh J
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Here is one run for MR1. Thanks.

2013-10-22 08:59:43,799 INFO org.apache.hadoop.mapred.JobClient (main):
Counters: 31
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Job Counters
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Launched reduce tasks=127
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
SLOTS_MILLIS_MAPS=173095295
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Total time spent by all reduces waiting after reserving slots (ms)=0
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Total time spent by all maps waiting after reserving slots (ms)=0
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Rack-local map tasks=13
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Launched map tasks=7630
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Data-local map tasks=7617
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
SLOTS_MILLIS_REDUCES=125929437
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
File Input Format Counters
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Bytes Read=1000478157952
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
File Output Format Counters
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
Bytes Written=1000000000000
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FileSystemCounters
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FILE_BYTES_READ=550120627113
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
HDFS_BYTES_READ=1000478910352
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FILE_BYTES_WRITTEN=725563889045
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
HDFS_BYTES_WRITTEN=1000000000000
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
Map-Reduce Framework
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map output materialized bytes=240948272514
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map input records=10000000000
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce shuffle bytes=240948272514
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Spilled Records=30000000000
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map output bytes=1000000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Total committed heap usage (bytes)=4841182068736
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
CPU time spent (ms)=118779050
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Map input bytes=1000000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
SPLIT_RAW_BYTES=752400
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Combine input records=0
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce input records=10000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce input groups=4294967296
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Combine output records=0
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Physical memory (bytes) snapshot=5092578041856
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce output records=10000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Virtual memory (bytes) snapshot=9391598915584
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Map output records=10000000000



On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> How many map and reduce slots are you using per tasktracker in MR1?  How
> do the average map times compare? (MR2 reports this directly on the web UI,
> but you can also get a sense in MR1 by scrolling through the map tasks
> page).  Can you share the counters for MR1?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>> to 2 times slowness. There should be something very wrong either in
>> configuration or code. Any hints?
>>
>>
>>
>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>
>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>
>>>
>>> 2013-10-20 03:13:58,751 ERROR [main]
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>> No lease on
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>> File does not exist. Holder
>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>> any open files.
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> --
>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>         at
>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.nio.channels.ClosedChannelException
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>         at
>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>         at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>         at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>> MR1 with JVM reuse turned off.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The Terasort output for MR2 is as follows.
>>>>>
>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=456102049355
>>>>>                 FILE: Number of bytes written=897246250517
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=32131
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=224
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=20
>>>>>                 Launched map tasks=7601
>>>>>                 Launched reduce tasks=132
>>>>>                 Data-local map tasks=7591
>>>>>                 Rack-local map tasks=10
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1696141311
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=2664045096
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440486356802
>>>>>                 Input split bytes=851200
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440486356802
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =851200
>>>>>                 Failed Shuffles=61
>>>>>                 Merged Map outputs=851200
>>>>>                 GC time elapsed (ms)=4215666
>>>>>                 CPU time spent (ms)=192433000
>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=4
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>> MR1. I cannot really believe it.
>>>>>>
>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>>> configurations are as follows.
>>>>>>
>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>
>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>
>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>         mapreduce.map.speculative = "true";
>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>
>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>         mapreduce.map.output.compress.codec =
>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>
>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>> "64";
>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>
>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>> it took around 90 minutes in MR2.
>>>>>>
>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>> configurations for terasort to work better in MR2?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>
>>>>>>> Sam,
>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>> be made.
>>>>>>>
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Harsh,
>>>>>>>
>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>> default size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>>
>>>>>>> Thanks as always!
>>>>>>>
>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>
>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to containers?
>>>>>>>>
>>>>>>>> This is a general question. You may use the same process you took to
>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>> there
>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>> or
>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>> change).
>>>>>>>>
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>
>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>> specific
>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>> config
>>>>>>>> name) will not apply anymore.
>>>>>>>>
>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Harsh,
>>>>>>>> >
>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>> setting, and
>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>> then, the
>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>> >
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due
>>>>>>>> > to a 22 GB memory?
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to
>>>>>>>> > containers?
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will
>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like
>>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>> >>
>>>>>>>> >> Hey Sam,
>>>>>>>> >>
>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>> config
>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>> same for
>>>>>>>> >> both test runs.
>>>>>>>> >>
>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > Hey Sam,
>>>>>>>> >> >
>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>>> what's
>>>>>>>> >> > causing the difference.
>>>>>>>> >> >
>>>>>>>> >> > A couple observations:
>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>> in there
>>>>>>>> >> > twice
>>>>>>>> >> > with two different values.
>>>>>>>> >> >
>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>> default
>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>> tasks getting
>>>>>>>> >> > killed for running over resource limits?
>>>>>>>> >> >
>>>>>>>> >> > -Sandy
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>> mins from
>>>>>>>> >> >> 33%
>>>>>>>> >> >> to 35% as below.
>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>> >> >>
>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>> Thanks!
>>>>>>>> >> >> (A) core-site.xml
>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>> >> >>
>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>> >> >>     <value>1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>> >> >>     <value>10</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>> >> >> </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>> >> >>    </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>> >> >>     <value>8</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>> >> >>     <value>4</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>> >> >>     <value>true</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>>> and
>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>> application.</description>
>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>> and port
>>>>>>>> >> >> is
>>>>>>>> >> >> the port
>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>> ResourceManager and
>>>>>>>> >> >> the
>>>>>>>> >> >> port is the port on
>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>> >> >> nodemanager</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>> port</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>10240</value>
>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>> >> >> GB</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>>> are moved
>>>>>>>> >> >> to
>>>>>>>> >> >> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>    <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>> >> >> directories</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>>> Reduce to
>>>>>>>> >> >> run </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>> >> >>     <value>24</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>> >> >>     <value>3</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>22000</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>> >> >>>
>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>> resource
>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>> minimum by
>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>>> GB NM
>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>> configs as at
>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>> and to not
>>>>>>>> >> >>> care about such things :)
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>> MRv1 and
>>>>>>>> >> >>> > Yarn.
>>>>>>>> >> >>> > But
>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>> I did not
>>>>>>>> >> >>> > tune
>>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>>> After
>>>>>>>> >> >>> > analyzing
>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>> comparison
>>>>>>>> >> >>> > again.
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>> compare
>>>>>>>> >> >>> > with
>>>>>>>> >> >>> > MRv1,
>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>> performance
>>>>>>>> >> >>> > tunning?
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Thanks!
>>>>>>>> >> >>> >
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>> near
>>>>>>>> >> >>> >>> future,
>>>>>>>> >> >>> >>> and
>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>>> hardware:
>>>>>>>> >> >>> >>> 2
>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>> on the
>>>>>>>> >> >>> >>> same
>>>>>>>> >> >>> >>> env. To
>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>> >> >>> >>> configurations, but
>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>> MRv1, if
>>>>>>>> >> >>> >>> they
>>>>>>>> >> >>> >>> all
>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>> framework than
>>>>>>>> >> >>> >>> MRv1.
>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>>> that Yarn
>>>>>>>> >> >>> >>> is
>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> >>> phase.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>> terasort?
>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>>> >> >>> >>> materials?
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Thanks!
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> --
>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Harsh J
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Harsh J
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

The number of map slots and reduce slots on each data node for MR1 are 8
and 3, respectively. Since MR2 could use all containers for either map or
reduce, I would expect that MR2 is faster.


On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> How many map and reduce slots are you using per tasktracker in MR1?  How
> do the average map times compare? (MR2 reports this directly on the web UI,
> but you can also get a sense in MR1 by scrolling through the map tasks
> page).  Can you share the counters for MR1?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>> to 2 times slowness. There should be something very wrong either in
>> configuration or code. Any hints?
>>
>>
>>
>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>
>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>
>>>
>>> 2013-10-20 03:13:58,751 ERROR [main]
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>> No lease on
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>> File does not exist. Holder
>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>> any open files.
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> --
>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>         at
>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.nio.channels.ClosedChannelException
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>         at
>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>         at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>         at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>> MR1 with JVM reuse turned off.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The Terasort output for MR2 is as follows.
>>>>>
>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=456102049355
>>>>>                 FILE: Number of bytes written=897246250517
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=32131
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=224
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=20
>>>>>                 Launched map tasks=7601
>>>>>                 Launched reduce tasks=132
>>>>>                 Data-local map tasks=7591
>>>>>                 Rack-local map tasks=10
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1696141311
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=2664045096
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440486356802
>>>>>                 Input split bytes=851200
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440486356802
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =851200
>>>>>                 Failed Shuffles=61
>>>>>                 Merged Map outputs=851200
>>>>>                 GC time elapsed (ms)=4215666
>>>>>                 CPU time spent (ms)=192433000
>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=4
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>> MR1. I cannot really believe it.
>>>>>>
>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>>> configurations are as follows.
>>>>>>
>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>
>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>
>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>         mapreduce.map.speculative = "true";
>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>
>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>         mapreduce.map.output.compress.codec =
>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>
>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>> "64";
>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>
>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>> it took around 90 minutes in MR2.
>>>>>>
>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>> configurations for terasort to work better in MR2?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>
>>>>>>> Sam,
>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>> be made.
>>>>>>>
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Harsh,
>>>>>>>
>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>> default size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>>
>>>>>>> Thanks as always!
>>>>>>>
>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>
>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to containers?
>>>>>>>>
>>>>>>>> This is a general question. You may use the same process you took to
>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>> there
>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>> or
>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>> change).
>>>>>>>>
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>
>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>> specific
>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>> config
>>>>>>>> name) will not apply anymore.
>>>>>>>>
>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Harsh,
>>>>>>>> >
>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>> setting, and
>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>> then, the
>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>> >
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due
>>>>>>>> > to a 22 GB memory?
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to
>>>>>>>> > containers?
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will
>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like
>>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>> >>
>>>>>>>> >> Hey Sam,
>>>>>>>> >>
>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>> config
>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>> same for
>>>>>>>> >> both test runs.
>>>>>>>> >>
>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > Hey Sam,
>>>>>>>> >> >
>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>>> what's
>>>>>>>> >> > causing the difference.
>>>>>>>> >> >
>>>>>>>> >> > A couple observations:
>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>> in there
>>>>>>>> >> > twice
>>>>>>>> >> > with two different values.
>>>>>>>> >> >
>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>> default
>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>> tasks getting
>>>>>>>> >> > killed for running over resource limits?
>>>>>>>> >> >
>>>>>>>> >> > -Sandy
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>> mins from
>>>>>>>> >> >> 33%
>>>>>>>> >> >> to 35% as below.
>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>> >> >>
>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>> Thanks!
>>>>>>>> >> >> (A) core-site.xml
>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>> >> >>
>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>> >> >>     <value>1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>> >> >>     <value>10</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>> >> >> </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>> >> >>    </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>> >> >>     <value>8</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>> >> >>     <value>4</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>> >> >>     <value>true</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>>> and
>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>> application.</description>
>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>> and port
>>>>>>>> >> >> is
>>>>>>>> >> >> the port
>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>> ResourceManager and
>>>>>>>> >> >> the
>>>>>>>> >> >> port is the port on
>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>> >> >> nodemanager</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>> port</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>10240</value>
>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>> >> >> GB</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>>> are moved
>>>>>>>> >> >> to
>>>>>>>> >> >> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>    <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>> >> >> directories</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>>> Reduce to
>>>>>>>> >> >> run </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>> >> >>     <value>24</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>> >> >>     <value>3</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>22000</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>> >> >>>
>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>> resource
>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>> minimum by
>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>>> GB NM
>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>> configs as at
>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>> and to not
>>>>>>>> >> >>> care about such things :)
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>> MRv1 and
>>>>>>>> >> >>> > Yarn.
>>>>>>>> >> >>> > But
>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>> I did not
>>>>>>>> >> >>> > tune
>>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>>> After
>>>>>>>> >> >>> > analyzing
>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>> comparison
>>>>>>>> >> >>> > again.
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>> compare
>>>>>>>> >> >>> > with
>>>>>>>> >> >>> > MRv1,
>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>> performance
>>>>>>>> >> >>> > tunning?
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Thanks!
>>>>>>>> >> >>> >
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>> near
>>>>>>>> >> >>> >>> future,
>>>>>>>> >> >>> >>> and
>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>>> hardware:
>>>>>>>> >> >>> >>> 2
>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>> on the
>>>>>>>> >> >>> >>> same
>>>>>>>> >> >>> >>> env. To
>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>> >> >>> >>> configurations, but
>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>> MRv1, if
>>>>>>>> >> >>> >>> they
>>>>>>>> >> >>> >>> all
>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>> framework than
>>>>>>>> >> >>> >>> MRv1.
>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>>> that Yarn
>>>>>>>> >> >>> >>> is
>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> >>> phase.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>> terasort?
>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>>> >> >>> >>> materials?
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Thanks!
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> --
>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Harsh J
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Harsh J
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

The number of map slots and reduce slots on each data node for MR1 are 8
and 3, respectively. Since MR2 could use all containers for either map or
reduce, I would expect that MR2 is faster.


On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> How many map and reduce slots are you using per tasktracker in MR1?  How
> do the average map times compare? (MR2 reports this directly on the web UI,
> but you can also get a sense in MR1 by scrolling through the map tasks
> page).  Can you share the counters for MR1?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>> to 2 times slowness. There should be something very wrong either in
>> configuration or code. Any hints?
>>
>>
>>
>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>
>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>
>>>
>>> 2013-10-20 03:13:58,751 ERROR [main]
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>> No lease on
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>> File does not exist. Holder
>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>> any open files.
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> --
>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>         at
>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.nio.channels.ClosedChannelException
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>         at
>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>         at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>         at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>> MR1 with JVM reuse turned off.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The Terasort output for MR2 is as follows.
>>>>>
>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=456102049355
>>>>>                 FILE: Number of bytes written=897246250517
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=32131
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=224
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=20
>>>>>                 Launched map tasks=7601
>>>>>                 Launched reduce tasks=132
>>>>>                 Data-local map tasks=7591
>>>>>                 Rack-local map tasks=10
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1696141311
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=2664045096
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440486356802
>>>>>                 Input split bytes=851200
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440486356802
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =851200
>>>>>                 Failed Shuffles=61
>>>>>                 Merged Map outputs=851200
>>>>>                 GC time elapsed (ms)=4215666
>>>>>                 CPU time spent (ms)=192433000
>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=4
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>> MR1. I cannot really believe it.
>>>>>>
>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>>> configurations are as follows.
>>>>>>
>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>
>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>
>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>         mapreduce.map.speculative = "true";
>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>
>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>         mapreduce.map.output.compress.codec =
>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>
>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>> "64";
>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>
>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>> it took around 90 minutes in MR2.
>>>>>>
>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>> configurations for terasort to work better in MR2?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>
>>>>>>> Sam,
>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>> be made.
>>>>>>>
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Harsh,
>>>>>>>
>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>> default size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>>
>>>>>>> Thanks as always!
>>>>>>>
>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>
>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to containers?
>>>>>>>>
>>>>>>>> This is a general question. You may use the same process you took to
>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>> there
>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>> or
>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>> change).
>>>>>>>>
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>
>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>> specific
>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>> config
>>>>>>>> name) will not apply anymore.
>>>>>>>>
>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Harsh,
>>>>>>>> >
>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>> setting, and
>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>> then, the
>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>> >
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due
>>>>>>>> > to a 22 GB memory?
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to
>>>>>>>> > containers?
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will
>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like
>>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>> >>
>>>>>>>> >> Hey Sam,
>>>>>>>> >>
>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>> config
>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>> same for
>>>>>>>> >> both test runs.
>>>>>>>> >>
>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > Hey Sam,
>>>>>>>> >> >
>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>>> what's
>>>>>>>> >> > causing the difference.
>>>>>>>> >> >
>>>>>>>> >> > A couple observations:
>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>> in there
>>>>>>>> >> > twice
>>>>>>>> >> > with two different values.
>>>>>>>> >> >
>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>> default
>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>> tasks getting
>>>>>>>> >> > killed for running over resource limits?
>>>>>>>> >> >
>>>>>>>> >> > -Sandy
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>> mins from
>>>>>>>> >> >> 33%
>>>>>>>> >> >> to 35% as below.
>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>> >> >>
>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>> Thanks!
>>>>>>>> >> >> (A) core-site.xml
>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>> >> >>
>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>> >> >>     <value>1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>> >> >>     <value>10</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>> >> >> </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>> >> >>    </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>> >> >>     <value>8</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>> >> >>     <value>4</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>> >> >>     <value>true</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>>> and
>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>> application.</description>
>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>> and port
>>>>>>>> >> >> is
>>>>>>>> >> >> the port
>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>> ResourceManager and
>>>>>>>> >> >> the
>>>>>>>> >> >> port is the port on
>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>> >> >> nodemanager</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>> port</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>10240</value>
>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>> >> >> GB</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>>> are moved
>>>>>>>> >> >> to
>>>>>>>> >> >> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>    <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>> >> >> directories</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>>> Reduce to
>>>>>>>> >> >> run </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>> >> >>     <value>24</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>> >> >>     <value>3</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>22000</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>> >> >>>
>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>> resource
>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>> minimum by
>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>>> GB NM
>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>> configs as at
>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>> and to not
>>>>>>>> >> >>> care about such things :)
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>> MRv1 and
>>>>>>>> >> >>> > Yarn.
>>>>>>>> >> >>> > But
>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>> I did not
>>>>>>>> >> >>> > tune
>>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>>> After
>>>>>>>> >> >>> > analyzing
>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>> comparison
>>>>>>>> >> >>> > again.
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>> compare
>>>>>>>> >> >>> > with
>>>>>>>> >> >>> > MRv1,
>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>> performance
>>>>>>>> >> >>> > tunning?
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Thanks!
>>>>>>>> >> >>> >
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>> near
>>>>>>>> >> >>> >>> future,
>>>>>>>> >> >>> >>> and
>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>>> hardware:
>>>>>>>> >> >>> >>> 2
>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>> on the
>>>>>>>> >> >>> >>> same
>>>>>>>> >> >>> >>> env. To
>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>> >> >>> >>> configurations, but
>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>> MRv1, if
>>>>>>>> >> >>> >>> they
>>>>>>>> >> >>> >>> all
>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>> framework than
>>>>>>>> >> >>> >>> MRv1.
>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>>> that Yarn
>>>>>>>> >> >>> >>> is
>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> >>> phase.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>> terasort?
>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>>> >> >>> >>> materials?
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Thanks!
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> --
>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Harsh J
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Harsh J
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Here is one run for MR1. Thanks.

2013-10-22 08:59:43,799 INFO org.apache.hadoop.mapred.JobClient (main):
Counters: 31
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Job Counters
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Launched reduce tasks=127
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
SLOTS_MILLIS_MAPS=173095295
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Total time spent by all reduces waiting after reserving slots (ms)=0
2013-10-22 08:59:43,800 INFO org.apache.hadoop.mapred.JobClient (main):
Total time spent by all maps waiting after reserving slots (ms)=0
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Rack-local map tasks=13
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Launched map tasks=7630
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Data-local map tasks=7617
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
SLOTS_MILLIS_REDUCES=125929437
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
File Input Format Counters
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
Bytes Read=1000478157952
2013-10-22 08:59:43,801 INFO org.apache.hadoop.mapred.JobClient (main):
File Output Format Counters
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
Bytes Written=1000000000000
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FileSystemCounters
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FILE_BYTES_READ=550120627113
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
HDFS_BYTES_READ=1000478910352
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
FILE_BYTES_WRITTEN=725563889045
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
HDFS_BYTES_WRITTEN=1000000000000
2013-10-22 08:59:43,802 INFO org.apache.hadoop.mapred.JobClient (main):
Map-Reduce Framework
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map output materialized bytes=240948272514
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map input records=10000000000
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce shuffle bytes=240948272514
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Spilled Records=30000000000
2013-10-22 08:59:43,803 INFO org.apache.hadoop.mapred.JobClient (main):
Map output bytes=1000000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Total committed heap usage (bytes)=4841182068736
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
CPU time spent (ms)=118779050
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Map input bytes=1000000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
SPLIT_RAW_BYTES=752400
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Combine input records=0
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce input records=10000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce input groups=4294967296
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Combine output records=0
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Physical memory (bytes) snapshot=5092578041856
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Reduce output records=10000000000
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Virtual memory (bytes) snapshot=9391598915584
2013-10-22 08:59:43,804 INFO org.apache.hadoop.mapred.JobClient (main):
Map output records=10000000000



On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> How many map and reduce slots are you using per tasktracker in MR1?  How
> do the average map times compare? (MR2 reports this directly on the web UI,
> but you can also get a sense in MR1 by scrolling through the map tasks
> page).  Can you share the counters for MR1?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>> to 2 times slowness. There should be something very wrong either in
>> configuration or code. Any hints?
>>
>>
>>
>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>
>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>
>>>
>>> 2013-10-20 03:13:58,751 ERROR [main]
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>> No lease on
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>> File does not exist. Holder
>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>> any open files.
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> --
>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>         at
>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.nio.channels.ClosedChannelException
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>         at
>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>         at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>         at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>> MR1 with JVM reuse turned off.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The Terasort output for MR2 is as follows.
>>>>>
>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=456102049355
>>>>>                 FILE: Number of bytes written=897246250517
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=32131
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=224
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=20
>>>>>                 Launched map tasks=7601
>>>>>                 Launched reduce tasks=132
>>>>>                 Data-local map tasks=7591
>>>>>                 Rack-local map tasks=10
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1696141311
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=2664045096
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440486356802
>>>>>                 Input split bytes=851200
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440486356802
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =851200
>>>>>                 Failed Shuffles=61
>>>>>                 Merged Map outputs=851200
>>>>>                 GC time elapsed (ms)=4215666
>>>>>                 CPU time spent (ms)=192433000
>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=4
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>> MR1. I cannot really believe it.
>>>>>>
>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>>> configurations are as follows.
>>>>>>
>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>
>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>
>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>         mapreduce.map.speculative = "true";
>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>
>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>         mapreduce.map.output.compress.codec =
>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>
>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>> "64";
>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>
>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>> it took around 90 minutes in MR2.
>>>>>>
>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>> configurations for terasort to work better in MR2?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>
>>>>>>> Sam,
>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>> be made.
>>>>>>>
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Harsh,
>>>>>>>
>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>> default size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>>
>>>>>>> Thanks as always!
>>>>>>>
>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>
>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to containers?
>>>>>>>>
>>>>>>>> This is a general question. You may use the same process you took to
>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>> there
>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>> or
>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>> change).
>>>>>>>>
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>
>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>> specific
>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>> config
>>>>>>>> name) will not apply anymore.
>>>>>>>>
>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Harsh,
>>>>>>>> >
>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>> setting, and
>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>> then, the
>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>> >
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due
>>>>>>>> > to a 22 GB memory?
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to
>>>>>>>> > containers?
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will
>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like
>>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>> >>
>>>>>>>> >> Hey Sam,
>>>>>>>> >>
>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>> config
>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>> same for
>>>>>>>> >> both test runs.
>>>>>>>> >>
>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > Hey Sam,
>>>>>>>> >> >
>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>>> what's
>>>>>>>> >> > causing the difference.
>>>>>>>> >> >
>>>>>>>> >> > A couple observations:
>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>> in there
>>>>>>>> >> > twice
>>>>>>>> >> > with two different values.
>>>>>>>> >> >
>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>> default
>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>> tasks getting
>>>>>>>> >> > killed for running over resource limits?
>>>>>>>> >> >
>>>>>>>> >> > -Sandy
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>> mins from
>>>>>>>> >> >> 33%
>>>>>>>> >> >> to 35% as below.
>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>> >> >>
>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>> Thanks!
>>>>>>>> >> >> (A) core-site.xml
>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>> >> >>
>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>> >> >>     <value>1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>> >> >>     <value>10</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>> >> >> </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>> >> >>    </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>> >> >>     <value>8</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>> >> >>     <value>4</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>> >> >>     <value>true</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>>> and
>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>> application.</description>
>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>> and port
>>>>>>>> >> >> is
>>>>>>>> >> >> the port
>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>> ResourceManager and
>>>>>>>> >> >> the
>>>>>>>> >> >> port is the port on
>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>> >> >> nodemanager</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>> port</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>10240</value>
>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>> >> >> GB</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>>> are moved
>>>>>>>> >> >> to
>>>>>>>> >> >> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>    <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>> >> >> directories</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>>> Reduce to
>>>>>>>> >> >> run </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>> >> >>     <value>24</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>> >> >>     <value>3</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>22000</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>> >> >>>
>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>> resource
>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>> minimum by
>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>>> GB NM
>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>> configs as at
>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>> and to not
>>>>>>>> >> >>> care about such things :)
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>> MRv1 and
>>>>>>>> >> >>> > Yarn.
>>>>>>>> >> >>> > But
>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>> I did not
>>>>>>>> >> >>> > tune
>>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>>> After
>>>>>>>> >> >>> > analyzing
>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>> comparison
>>>>>>>> >> >>> > again.
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>> compare
>>>>>>>> >> >>> > with
>>>>>>>> >> >>> > MRv1,
>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>> performance
>>>>>>>> >> >>> > tunning?
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Thanks!
>>>>>>>> >> >>> >
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>> near
>>>>>>>> >> >>> >>> future,
>>>>>>>> >> >>> >>> and
>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>>> hardware:
>>>>>>>> >> >>> >>> 2
>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>> on the
>>>>>>>> >> >>> >>> same
>>>>>>>> >> >>> >>> env. To
>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>> >> >>> >>> configurations, but
>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>> MRv1, if
>>>>>>>> >> >>> >>> they
>>>>>>>> >> >>> >>> all
>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>> framework than
>>>>>>>> >> >>> >>> MRv1.
>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>>> that Yarn
>>>>>>>> >> >>> >>> is
>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> >>> phase.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>> terasort?
>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>>> >> >>> >>> materials?
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Thanks!
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> --
>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Harsh J
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Harsh J
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

The number of map slots and reduce slots on each data node for MR1 are 8
and 3, respectively. Since MR2 could use all containers for either map or
reduce, I would expect that MR2 is faster.


On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> How many map and reduce slots are you using per tasktracker in MR1?  How
> do the average map times compare? (MR2 reports this directly on the web UI,
> but you can also get a sense in MR1 by scrolling through the map tasks
> page).  Can you share the counters for MR1?
>
> -Sandy
>
>
> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>> to 2 times slowness. There should be something very wrong either in
>> configuration or code. Any hints?
>>
>>
>>
>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>
>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>
>>>
>>> 2013-10-20 03:13:58,751 ERROR [main]
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>> No lease on
>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>> File does not exist. Holder
>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>> any open files.
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>         at
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>         at
>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>         at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>> --
>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>         at
>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>         at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>         at
>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>         at
>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>>> Exception running child : java.nio.channels.ClosedChannelException
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>         at
>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>         at
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>         at
>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>         at
>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>         at
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>         at
>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>>
>>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>>> MR1 with JVM reuse turned off.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> The Terasort output for MR2 is as follows.
>>>>>
>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>>> Counters: 46
>>>>>         File System Counters
>>>>>                 FILE: Number of bytes read=456102049355
>>>>>                 FILE: Number of bytes written=897246250517
>>>>>                 FILE: Number of read operations=0
>>>>>                 FILE: Number of large read operations=0
>>>>>                 FILE: Number of write operations=0
>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>                 HDFS: Number of read operations=32131
>>>>>                 HDFS: Number of large read operations=0
>>>>>                 HDFS: Number of write operations=224
>>>>>         Job Counters
>>>>>                 Killed map tasks=1
>>>>>                 Killed reduce tasks=20
>>>>>                 Launched map tasks=7601
>>>>>                 Launched reduce tasks=132
>>>>>                 Data-local map tasks=7591
>>>>>                 Rack-local map tasks=10
>>>>>                 Total time spent by all maps in occupied slots
>>>>> (ms)=1696141311
>>>>>                 Total time spent by all reduces in occupied slots
>>>>> (ms)=2664045096
>>>>>         Map-Reduce Framework
>>>>>                 Map input records=10000000000
>>>>>                 Map output records=10000000000
>>>>>                 Map output bytes=1020000000000
>>>>>                 Map output materialized bytes=440486356802
>>>>>                 Input split bytes=851200
>>>>>                 Combine input records=0
>>>>>                 Combine output records=0
>>>>>                 Reduce input groups=10000000000
>>>>>                 Reduce shuffle bytes=440486356802
>>>>>                 Reduce input records=10000000000
>>>>>                 Reduce output records=10000000000
>>>>>                 Spilled Records=20000000000
>>>>>                 Shuffled Maps =851200
>>>>>                 Failed Shuffles=61
>>>>>                 Merged Map outputs=851200
>>>>>                 GC time elapsed (ms)=4215666
>>>>>                 CPU time spent (ms)=192433000
>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>         Shuffle Errors
>>>>>                 BAD_ID=0
>>>>>                 CONNECTION=0
>>>>>                 IO_ERROR=4
>>>>>                 WRONG_LENGTH=0
>>>>>                 WRONG_MAP=0
>>>>>                 WRONG_REDUCE=0
>>>>>         File Input Format Counters
>>>>>                 Bytes Read=1000000000000
>>>>>         File Output Format Counters
>>>>>                 Bytes Written=1000000000000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3
>>>>>> and it turned out that the terasort for MR2 is 2 times slower than that in
>>>>>> MR1. I cannot really believe it.
>>>>>>
>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>>> configurations are as follows.
>>>>>>
>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>
>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>
>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>         mapreduce.map.speculative = "true";
>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>
>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>         mapreduce.map.output.compress.codec =
>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>
>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>>> "64";
>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>
>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>>> it took around 90 minutes in MR2.
>>>>>>
>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>> configurations for terasort to work better in MR2?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>
>>>>>>> Sam,
>>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>>> be made.
>>>>>>>
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Harsh,
>>>>>>>
>>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>>> cluster improved a lot after increasing the reducer
>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>> default size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>>
>>>>>>> Thanks as always!
>>>>>>>
>>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>>
>>>>>>>> Hi Sam,
>>>>>>>>
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>
>>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to containers?
>>>>>>>>
>>>>>>>> This is a general question. You may use the same process you took to
>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>> there
>>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>>> lower # of concurrent containers at a given time (runtime change),
>>>>>>>> or
>>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>>> change).
>>>>>>>>
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>
>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>> specific
>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>> config
>>>>>>>> name) will not apply anymore.
>>>>>>>>
>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi Harsh,
>>>>>>>> >
>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>> setting, and
>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>> then, the
>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>> >
>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>> containers due
>>>>>>>> > to a 22 GB memory?
>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>>> assigned to
>>>>>>>> > containers?
>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>>> 'yarn', will
>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>> framework? Like
>>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>>> >
>>>>>>>> > Thanks!
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>>> >>
>>>>>>>> >> Hey Sam,
>>>>>>>> >>
>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>> config
>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>> same for
>>>>>>>> >> both test runs.
>>>>>>>> >>
>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>> >> wrote:
>>>>>>>> >> > Hey Sam,
>>>>>>>> >> >
>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>>> what's
>>>>>>>> >> > causing the difference.
>>>>>>>> >> >
>>>>>>>> >> > A couple observations:
>>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
>>>>>>>> in there
>>>>>>>> >> > twice
>>>>>>>> >> > with two different values.
>>>>>>>> >> >
>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>>> default
>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>> tasks getting
>>>>>>>> >> > killed for running over resource limits?
>>>>>>>> >> >
>>>>>>>> >> > -Sandy
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>>> mins from
>>>>>>>> >> >> 33%
>>>>>>>> >> >> to 35% as below.
>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>> >> >>
>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>> Thanks!
>>>>>>>> >> >> (A) core-site.xml
>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>> >> >>
>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>> >> >>     <value>1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>> >> >>     <value>10</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>> >> >>     <description>No description</description>
>>>>>>>> >> >>     <final>true</final>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>> >> >> </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>> >> >>    </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>> >> >>     <value>8</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>> >> >>     <value>4</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>> >> >>     <value>true</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>>> and
>>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>> application.</description>
>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>>> and port
>>>>>>>> >> >> is
>>>>>>>> >> >> the port
>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>> Resource
>>>>>>>> >> >> Manager.
>>>>>>>> >> >>     </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>> ResourceManager and
>>>>>>>> >> >> the
>>>>>>>> >> >> port is the port on
>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>> >> >> nodemanager</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>> port</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>10240</value>
>>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>>> >> >> GB</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>>> are moved
>>>>>>>> >> >> to
>>>>>>>> >> >> </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>    <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>> >> >>
>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>>> >> >> directories</description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>>> Reduce to
>>>>>>>> >> >> run </description>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>   <property>
>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>> >> >>     <value>64</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>> >> >>     <value>24</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >> <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>> >> >>     <value>3</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>> >> >>     <value>22000</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>  <property>
>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>> >> >>   </property>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>>> >> >>>
>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>>> resource
>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>> minimum by
>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>>> GB NM
>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>> configs as at
>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users
>>>>>>>> and to not
>>>>>>>> >> >>> care about such things :)
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>>> MRv1 and
>>>>>>>> >> >>> > Yarn.
>>>>>>>> >> >>> > But
>>>>>>>> >> >>> > they have many differences, and to be fair for comparison
>>>>>>>> I did not
>>>>>>>> >> >>> > tune
>>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>>> After
>>>>>>>> >> >>> > analyzing
>>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>>> comparison
>>>>>>>> >> >>> > again.
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>>> compare
>>>>>>>> >> >>> > with
>>>>>>>> >> >>> > MRv1,
>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>>> performance
>>>>>>>> >> >>> > tunning?
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > Thanks!
>>>>>>>> >> >>> >
>>>>>>>> >> >>> >
>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>>> near
>>>>>>>> >> >>> >>> future,
>>>>>>>> >> >>> >>> and
>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>>> hardware:
>>>>>>>> >> >>> >>> 2
>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>>> on the
>>>>>>>> >> >>> >>> same
>>>>>>>> >> >>> >>> env. To
>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>>> >> >>> >>> configurations, but
>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>>> MRv1, if
>>>>>>>> >> >>> >>> they
>>>>>>>> >> >>> >>> all
>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>> framework than
>>>>>>>> >> >>> >>> MRv1.
>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>>> that Yarn
>>>>>>>> >> >>> >>> is
>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>>> Reduce
>>>>>>>> >> >>> >>> phase.
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>> terasort?
>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>>> >> >>> >>> materials?
>>>>>>>> >> >>> >>>
>>>>>>>> >> >>> >>> Thanks!
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >> --
>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>> >> >>> >>
>>>>>>>> >> >>> >
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Harsh J
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Harsh J
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Harsh J
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

How many map and reduce slots are you using per tasktracker in MR1?  How do
the average map times compare? (MR2 reports this directly on the web UI,
but you can also get a sense in MR1 by scrolling through the map tasks
page).  Can you share the counters for MR1?

-Sandy


On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang
<ji...@gmail.com>wrote:

> Unfortunately, turning off JVM reuse still got the same result, i.e.,
> about 90 minutes for MR2. I don't think the killed reduces could contribute
> to 2 times slowness. There should be something very wrong either in
> configuration or code. Any hints?
>
>
>
> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>
>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>
>>
>> 2013-10-20 03:13:58,751 ERROR [main]
>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>> No lease on
>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>> File does not exist. Holder
>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>> any open files.
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>         at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>> --
>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>         at
>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>         at
>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>         at
>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>         at
>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>         at
>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>> Exception running child : java.nio.channels.ClosedChannelException
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>         at
>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>         at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>         at
>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>         at
>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>         at
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>         at
>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>         at
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>         at
>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>
>>
>>
>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>> MR1 with JVM reuse turned off.
>>>
>>> -Sandy
>>>
>>>
>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The Terasort output for MR2 is as follows.
>>>>
>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>> Counters: 46
>>>>         File System Counters
>>>>                 FILE: Number of bytes read=456102049355
>>>>                 FILE: Number of bytes written=897246250517
>>>>                 FILE: Number of read operations=0
>>>>                 FILE: Number of large read operations=0
>>>>                 FILE: Number of write operations=0
>>>>                 HDFS: Number of bytes read=1000000851200
>>>>                 HDFS: Number of bytes written=1000000000000
>>>>                 HDFS: Number of read operations=32131
>>>>                 HDFS: Number of large read operations=0
>>>>                 HDFS: Number of write operations=224
>>>>         Job Counters
>>>>                 Killed map tasks=1
>>>>                 Killed reduce tasks=20
>>>>                 Launched map tasks=7601
>>>>                 Launched reduce tasks=132
>>>>                 Data-local map tasks=7591
>>>>                 Rack-local map tasks=10
>>>>                 Total time spent by all maps in occupied slots
>>>> (ms)=1696141311
>>>>                 Total time spent by all reduces in occupied slots
>>>> (ms)=2664045096
>>>>         Map-Reduce Framework
>>>>                 Map input records=10000000000
>>>>                 Map output records=10000000000
>>>>                 Map output bytes=1020000000000
>>>>                 Map output materialized bytes=440486356802
>>>>                 Input split bytes=851200
>>>>                 Combine input records=0
>>>>                 Combine output records=0
>>>>                 Reduce input groups=10000000000
>>>>                 Reduce shuffle bytes=440486356802
>>>>                 Reduce input records=10000000000
>>>>                 Reduce output records=10000000000
>>>>                 Spilled Records=20000000000
>>>>                 Shuffled Maps =851200
>>>>                 Failed Shuffles=61
>>>>                 Merged Map outputs=851200
>>>>                 GC time elapsed (ms)=4215666
>>>>                 CPU time spent (ms)=192433000
>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>         Shuffle Errors
>>>>                 BAD_ID=0
>>>>                 CONNECTION=0
>>>>                 IO_ERROR=4
>>>>                 WRONG_LENGTH=0
>>>>                 WRONG_MAP=0
>>>>                 WRONG_REDUCE=0
>>>>         File Input Format Counters
>>>>                 Bytes Read=1000000000000
>>>>         File Output Format Counters
>>>>                 Bytes Written=1000000000000
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>>> I cannot really believe it.
>>>>>
>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>> configurations are as follows.
>>>>>
>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>         mapreduce.map.memory.mb = "768";
>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>
>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>
>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>         mapreduce.map.speculative = "true";
>>>>>         mapreduce.reduce.speculative = "true";
>>>>>         mapreduce.framework.name = "yarn";
>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>
>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>         mapreduce.map.output.compress = "true";
>>>>>         mapreduce.map.output.compress.codec =
>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>
>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>> "64";
>>>>>         yarn.resourcemanager.scheduler.class =
>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>         yarn.nodemanager.container-executor.class =
>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>
>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>> it took around 90 minutes in MR2.
>>>>>
>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>> configurations for terasort to work better in MR2?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>> michael_segel@hotmail.com> wrote:
>>>>>
>>>>>> Sam,
>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>> be made.
>>>>>>
>>>>>>
>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>
>>>>>> Mike Segel
>>>>>>
>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Harsh,
>>>>>>
>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>> cluster improved a lot after increasing the reducer
>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>
>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>>> a job?
>>>>>>
>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?
>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>> size of physical mem of a container?
>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>
>>>>>> Thanks as always!
>>>>>>
>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>
>>>>>>> Hi Sam,
>>>>>>>
>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>> containers due to a 22 GB memory?
>>>>>>>
>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>> Sandy's mentioned in his post.
>>>>>>>
>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>> assigned to containers?
>>>>>>>
>>>>>>> This is a general question. You may use the same process you took to
>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>> change).
>>>>>>>
>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>
>>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>> config
>>>>>>> name) will not apply anymore.
>>>>>>>
>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>> wrote:
>>>>>>> > Hi Harsh,
>>>>>>> >
>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>> setting, and
>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>> then, the
>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>> >
>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>> containers due
>>>>>>> > to a 22 GB memory?
>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>> assigned to
>>>>>>> > containers?
>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>> 'yarn', will
>>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>>> Like
>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>> >
>>>>>>> > Thanks!
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>> >>
>>>>>>> >> Hey Sam,
>>>>>>> >>
>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>>>> for
>>>>>>> >> both test runs.
>>>>>>> >>
>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>> sandy.ryza@cloudera.com>
>>>>>>> >> wrote:
>>>>>>> >> > Hey Sam,
>>>>>>> >> >
>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>> what's
>>>>>>> >> > causing the difference.
>>>>>>> >> >
>>>>>>> >> > A couple observations:
>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>>>> there
>>>>>>> >> > twice
>>>>>>> >> > with two different values.
>>>>>>> >> >
>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>> default
>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>> tasks getting
>>>>>>> >> > killed for running over resource limits?
>>>>>>> >> >
>>>>>>> >> > -Sandy
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>> >> >>
>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>> mins from
>>>>>>> >> >> 33%
>>>>>>> >> >> to 35% as below.
>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>> >> >>
>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>> Thanks!
>>>>>>> >> >> (A) core-site.xml
>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>> >> >>
>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>> >> >>     <value>1</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>> >> >>     <value>64</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>> >> >>     <value>10</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> (C) mapred-site.xml
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>> >> >>     <description>No description</description>
>>>>>>> >> >>     <final>true</final>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>> >> >>     <description>No description</description>
>>>>>>> >> >>     <final>true</final>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>> >> >> </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>> >> >>     <value>yarn</value>
>>>>>>> >> >>    </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>> >> >>     <value>8</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>> >> >>     <value>4</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>> >> >>     <value>true</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> (D) yarn-site.xml
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>> and
>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>> Resource
>>>>>>> >> >> Manager.
>>>>>>> >> >>     </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <description>The address of the RM web
>>>>>>> application.</description>
>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>> and port
>>>>>>> >> >> is
>>>>>>> >> >> the port
>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>> Resource
>>>>>>> >> >> Manager.
>>>>>>> >> >>     </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>> ResourceManager and
>>>>>>> >> >> the
>>>>>>> >> >> port is the port on
>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>> </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>> >> >>     <description>the local directories used by the
>>>>>>> >> >> nodemanager</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>> port</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>> >> >>     <value>10240</value>
>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>> >> >> GB</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>> are moved
>>>>>>> >> >> to
>>>>>>> >> >> </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>    <property>
>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>> >> >> directories</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>> Reduce to
>>>>>>> >> >> run </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>> >> >>     <value>64</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>> >> >>     <value>24</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>> >> >>     <value>3</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>> >> >>     <value>22000</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>> >> >>     <value>2.1</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>> >> >>>
>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>> resource
>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>> minimum by
>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>> GB NM
>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>> configs as at
>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>> >> >>>
>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>>>> to not
>>>>>>> >> >>> care about such things :)
>>>>>>> >> >>>
>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>> samliuhadoop@gmail.com>
>>>>>>> >> >>> wrote:
>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>> MRv1 and
>>>>>>> >> >>> > Yarn.
>>>>>>> >> >>> > But
>>>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>>>> did not
>>>>>>> >> >>> > tune
>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>> After
>>>>>>> >> >>> > analyzing
>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>> comparison
>>>>>>> >> >>> > again.
>>>>>>> >> >>> >
>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>> compare
>>>>>>> >> >>> > with
>>>>>>> >> >>> > MRv1,
>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>> Reduce
>>>>>>> >> >>> > phase(terasort test).
>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>> performance
>>>>>>> >> >>> > tunning?
>>>>>>> >> >>> >
>>>>>>> >> >>> > Thanks!
>>>>>>> >> >>> >
>>>>>>> >> >>> >
>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>> marcosluis2186@gmail.com>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Hi Experts,
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>> near
>>>>>>> >> >>> >>> future,
>>>>>>> >> >>> >>> and
>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>> hardware:
>>>>>>> >> >>> >>> 2
>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>> on the
>>>>>>> >> >>> >>> same
>>>>>>> >> >>> >>> env. To
>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>> >> >>> >>> configurations, but
>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>> MRv1, if
>>>>>>> >> >>> >>> they
>>>>>>> >> >>> >>> all
>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>> framework than
>>>>>>> >> >>> >>> MRv1.
>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>> that Yarn
>>>>>>> >> >>> >>> is
>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>> Reduce
>>>>>>> >> >>> >>> phase.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>> >> >>> >>> materials?
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Thanks!
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> --
>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>> >> >>> >>
>>>>>>> >> >>> >
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> --
>>>>>>> >> >>> Harsh J
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Harsh J
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Harsh J
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

How many map and reduce slots are you using per tasktracker in MR1?  How do
the average map times compare? (MR2 reports this directly on the web UI,
but you can also get a sense in MR1 by scrolling through the map tasks
page).  Can you share the counters for MR1?

-Sandy


On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang
<ji...@gmail.com>wrote:

> Unfortunately, turning off JVM reuse still got the same result, i.e.,
> about 90 minutes for MR2. I don't think the killed reduces could contribute
> to 2 times slowness. There should be something very wrong either in
> configuration or code. Any hints?
>
>
>
> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>
>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>
>>
>> 2013-10-20 03:13:58,751 ERROR [main]
>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>> No lease on
>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>> File does not exist. Holder
>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>> any open files.
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>         at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>> --
>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>         at
>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>         at
>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>         at
>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>         at
>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>         at
>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>> Exception running child : java.nio.channels.ClosedChannelException
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>         at
>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>         at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>         at
>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>         at
>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>         at
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>         at
>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>         at
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>         at
>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>
>>
>>
>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>> MR1 with JVM reuse turned off.
>>>
>>> -Sandy
>>>
>>>
>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The Terasort output for MR2 is as follows.
>>>>
>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>> Counters: 46
>>>>         File System Counters
>>>>                 FILE: Number of bytes read=456102049355
>>>>                 FILE: Number of bytes written=897246250517
>>>>                 FILE: Number of read operations=0
>>>>                 FILE: Number of large read operations=0
>>>>                 FILE: Number of write operations=0
>>>>                 HDFS: Number of bytes read=1000000851200
>>>>                 HDFS: Number of bytes written=1000000000000
>>>>                 HDFS: Number of read operations=32131
>>>>                 HDFS: Number of large read operations=0
>>>>                 HDFS: Number of write operations=224
>>>>         Job Counters
>>>>                 Killed map tasks=1
>>>>                 Killed reduce tasks=20
>>>>                 Launched map tasks=7601
>>>>                 Launched reduce tasks=132
>>>>                 Data-local map tasks=7591
>>>>                 Rack-local map tasks=10
>>>>                 Total time spent by all maps in occupied slots
>>>> (ms)=1696141311
>>>>                 Total time spent by all reduces in occupied slots
>>>> (ms)=2664045096
>>>>         Map-Reduce Framework
>>>>                 Map input records=10000000000
>>>>                 Map output records=10000000000
>>>>                 Map output bytes=1020000000000
>>>>                 Map output materialized bytes=440486356802
>>>>                 Input split bytes=851200
>>>>                 Combine input records=0
>>>>                 Combine output records=0
>>>>                 Reduce input groups=10000000000
>>>>                 Reduce shuffle bytes=440486356802
>>>>                 Reduce input records=10000000000
>>>>                 Reduce output records=10000000000
>>>>                 Spilled Records=20000000000
>>>>                 Shuffled Maps =851200
>>>>                 Failed Shuffles=61
>>>>                 Merged Map outputs=851200
>>>>                 GC time elapsed (ms)=4215666
>>>>                 CPU time spent (ms)=192433000
>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>         Shuffle Errors
>>>>                 BAD_ID=0
>>>>                 CONNECTION=0
>>>>                 IO_ERROR=4
>>>>                 WRONG_LENGTH=0
>>>>                 WRONG_MAP=0
>>>>                 WRONG_REDUCE=0
>>>>         File Input Format Counters
>>>>                 Bytes Read=1000000000000
>>>>         File Output Format Counters
>>>>                 Bytes Written=1000000000000
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>>> I cannot really believe it.
>>>>>
>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>> configurations are as follows.
>>>>>
>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>         mapreduce.map.memory.mb = "768";
>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>
>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>
>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>         mapreduce.map.speculative = "true";
>>>>>         mapreduce.reduce.speculative = "true";
>>>>>         mapreduce.framework.name = "yarn";
>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>
>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>         mapreduce.map.output.compress = "true";
>>>>>         mapreduce.map.output.compress.codec =
>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>
>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>> "64";
>>>>>         yarn.resourcemanager.scheduler.class =
>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>         yarn.nodemanager.container-executor.class =
>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>
>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>> it took around 90 minutes in MR2.
>>>>>
>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>> configurations for terasort to work better in MR2?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>> michael_segel@hotmail.com> wrote:
>>>>>
>>>>>> Sam,
>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>> be made.
>>>>>>
>>>>>>
>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>
>>>>>> Mike Segel
>>>>>>
>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Harsh,
>>>>>>
>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>> cluster improved a lot after increasing the reducer
>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>
>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>>> a job?
>>>>>>
>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?
>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>> size of physical mem of a container?
>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>
>>>>>> Thanks as always!
>>>>>>
>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>
>>>>>>> Hi Sam,
>>>>>>>
>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>> containers due to a 22 GB memory?
>>>>>>>
>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>> Sandy's mentioned in his post.
>>>>>>>
>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>> assigned to containers?
>>>>>>>
>>>>>>> This is a general question. You may use the same process you took to
>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>> change).
>>>>>>>
>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>
>>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>> config
>>>>>>> name) will not apply anymore.
>>>>>>>
>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>> wrote:
>>>>>>> > Hi Harsh,
>>>>>>> >
>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>> setting, and
>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>> then, the
>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>> >
>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>> containers due
>>>>>>> > to a 22 GB memory?
>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>> assigned to
>>>>>>> > containers?
>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>> 'yarn', will
>>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>>> Like
>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>> >
>>>>>>> > Thanks!
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>> >>
>>>>>>> >> Hey Sam,
>>>>>>> >>
>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>>>> for
>>>>>>> >> both test runs.
>>>>>>> >>
>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>> sandy.ryza@cloudera.com>
>>>>>>> >> wrote:
>>>>>>> >> > Hey Sam,
>>>>>>> >> >
>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>> what's
>>>>>>> >> > causing the difference.
>>>>>>> >> >
>>>>>>> >> > A couple observations:
>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>>>> there
>>>>>>> >> > twice
>>>>>>> >> > with two different values.
>>>>>>> >> >
>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>> default
>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>> tasks getting
>>>>>>> >> > killed for running over resource limits?
>>>>>>> >> >
>>>>>>> >> > -Sandy
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>> >> >>
>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>> mins from
>>>>>>> >> >> 33%
>>>>>>> >> >> to 35% as below.
>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>> >> >>
>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>> Thanks!
>>>>>>> >> >> (A) core-site.xml
>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>> >> >>
>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>> >> >>     <value>1</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>> >> >>     <value>64</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>> >> >>     <value>10</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> (C) mapred-site.xml
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>> >> >>     <description>No description</description>
>>>>>>> >> >>     <final>true</final>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>> >> >>     <description>No description</description>
>>>>>>> >> >>     <final>true</final>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>> >> >> </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>> >> >>     <value>yarn</value>
>>>>>>> >> >>    </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>> >> >>     <value>8</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>> >> >>     <value>4</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>> >> >>     <value>true</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> (D) yarn-site.xml
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>> and
>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>> Resource
>>>>>>> >> >> Manager.
>>>>>>> >> >>     </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <description>The address of the RM web
>>>>>>> application.</description>
>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>> and port
>>>>>>> >> >> is
>>>>>>> >> >> the port
>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>> Resource
>>>>>>> >> >> Manager.
>>>>>>> >> >>     </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>> ResourceManager and
>>>>>>> >> >> the
>>>>>>> >> >> port is the port on
>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>> </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>> >> >>     <description>the local directories used by the
>>>>>>> >> >> nodemanager</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>> port</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>> >> >>     <value>10240</value>
>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>> >> >> GB</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>> are moved
>>>>>>> >> >> to
>>>>>>> >> >> </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>    <property>
>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>> >> >> directories</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>> Reduce to
>>>>>>> >> >> run </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>> >> >>     <value>64</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>> >> >>     <value>24</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>> >> >>     <value>3</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>> >> >>     <value>22000</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>> >> >>     <value>2.1</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>> >> >>>
>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>> resource
>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>> minimum by
>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>> GB NM
>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>> configs as at
>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>> >> >>>
>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>>>> to not
>>>>>>> >> >>> care about such things :)
>>>>>>> >> >>>
>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>> samliuhadoop@gmail.com>
>>>>>>> >> >>> wrote:
>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>> MRv1 and
>>>>>>> >> >>> > Yarn.
>>>>>>> >> >>> > But
>>>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>>>> did not
>>>>>>> >> >>> > tune
>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>> After
>>>>>>> >> >>> > analyzing
>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>> comparison
>>>>>>> >> >>> > again.
>>>>>>> >> >>> >
>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>> compare
>>>>>>> >> >>> > with
>>>>>>> >> >>> > MRv1,
>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>> Reduce
>>>>>>> >> >>> > phase(terasort test).
>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>> performance
>>>>>>> >> >>> > tunning?
>>>>>>> >> >>> >
>>>>>>> >> >>> > Thanks!
>>>>>>> >> >>> >
>>>>>>> >> >>> >
>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>> marcosluis2186@gmail.com>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Hi Experts,
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>> near
>>>>>>> >> >>> >>> future,
>>>>>>> >> >>> >>> and
>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>> hardware:
>>>>>>> >> >>> >>> 2
>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>> on the
>>>>>>> >> >>> >>> same
>>>>>>> >> >>> >>> env. To
>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>> >> >>> >>> configurations, but
>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>> MRv1, if
>>>>>>> >> >>> >>> they
>>>>>>> >> >>> >>> all
>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>> framework than
>>>>>>> >> >>> >>> MRv1.
>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>> that Yarn
>>>>>>> >> >>> >>> is
>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>> Reduce
>>>>>>> >> >>> >>> phase.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>> >> >>> >>> materials?
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Thanks!
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> --
>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>> >> >>> >>
>>>>>>> >> >>> >
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> --
>>>>>>> >> >>> Harsh J
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Harsh J
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Harsh J
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

How many map and reduce slots are you using per tasktracker in MR1?  How do
the average map times compare? (MR2 reports this directly on the web UI,
but you can also get a sense in MR1 by scrolling through the map tasks
page).  Can you share the counters for MR1?

-Sandy


On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang
<ji...@gmail.com>wrote:

> Unfortunately, turning off JVM reuse still got the same result, i.e.,
> about 90 minutes for MR2. I don't think the killed reduces could contribute
> to 2 times slowness. There should be something very wrong either in
> configuration or code. Any hints?
>
>
>
> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>
>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>
>>
>> 2013-10-20 03:13:58,751 ERROR [main]
>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>> No lease on
>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>> File does not exist. Holder
>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>> any open files.
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>         at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>> --
>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>         at
>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>         at
>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>         at
>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>         at
>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>         at
>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>> Exception running child : java.nio.channels.ClosedChannelException
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>         at
>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>         at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>         at
>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>         at
>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>         at
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>         at
>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>         at
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>         at
>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>
>>
>>
>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>> MR1 with JVM reuse turned off.
>>>
>>> -Sandy
>>>
>>>
>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The Terasort output for MR2 is as follows.
>>>>
>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>> Counters: 46
>>>>         File System Counters
>>>>                 FILE: Number of bytes read=456102049355
>>>>                 FILE: Number of bytes written=897246250517
>>>>                 FILE: Number of read operations=0
>>>>                 FILE: Number of large read operations=0
>>>>                 FILE: Number of write operations=0
>>>>                 HDFS: Number of bytes read=1000000851200
>>>>                 HDFS: Number of bytes written=1000000000000
>>>>                 HDFS: Number of read operations=32131
>>>>                 HDFS: Number of large read operations=0
>>>>                 HDFS: Number of write operations=224
>>>>         Job Counters
>>>>                 Killed map tasks=1
>>>>                 Killed reduce tasks=20
>>>>                 Launched map tasks=7601
>>>>                 Launched reduce tasks=132
>>>>                 Data-local map tasks=7591
>>>>                 Rack-local map tasks=10
>>>>                 Total time spent by all maps in occupied slots
>>>> (ms)=1696141311
>>>>                 Total time spent by all reduces in occupied slots
>>>> (ms)=2664045096
>>>>         Map-Reduce Framework
>>>>                 Map input records=10000000000
>>>>                 Map output records=10000000000
>>>>                 Map output bytes=1020000000000
>>>>                 Map output materialized bytes=440486356802
>>>>                 Input split bytes=851200
>>>>                 Combine input records=0
>>>>                 Combine output records=0
>>>>                 Reduce input groups=10000000000
>>>>                 Reduce shuffle bytes=440486356802
>>>>                 Reduce input records=10000000000
>>>>                 Reduce output records=10000000000
>>>>                 Spilled Records=20000000000
>>>>                 Shuffled Maps =851200
>>>>                 Failed Shuffles=61
>>>>                 Merged Map outputs=851200
>>>>                 GC time elapsed (ms)=4215666
>>>>                 CPU time spent (ms)=192433000
>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>         Shuffle Errors
>>>>                 BAD_ID=0
>>>>                 CONNECTION=0
>>>>                 IO_ERROR=4
>>>>                 WRONG_LENGTH=0
>>>>                 WRONG_MAP=0
>>>>                 WRONG_REDUCE=0
>>>>         File Input Format Counters
>>>>                 Bytes Read=1000000000000
>>>>         File Output Format Counters
>>>>                 Bytes Written=1000000000000
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>>> I cannot really believe it.
>>>>>
>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>> configurations are as follows.
>>>>>
>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>         mapreduce.map.memory.mb = "768";
>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>
>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>
>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>         mapreduce.map.speculative = "true";
>>>>>         mapreduce.reduce.speculative = "true";
>>>>>         mapreduce.framework.name = "yarn";
>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>
>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>         mapreduce.map.output.compress = "true";
>>>>>         mapreduce.map.output.compress.codec =
>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>
>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>> "64";
>>>>>         yarn.resourcemanager.scheduler.class =
>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>         yarn.nodemanager.container-executor.class =
>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>
>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>> it took around 90 minutes in MR2.
>>>>>
>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>> configurations for terasort to work better in MR2?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>> michael_segel@hotmail.com> wrote:
>>>>>
>>>>>> Sam,
>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>> be made.
>>>>>>
>>>>>>
>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>
>>>>>> Mike Segel
>>>>>>
>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Harsh,
>>>>>>
>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>> cluster improved a lot after increasing the reducer
>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>
>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>>> a job?
>>>>>>
>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?
>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>> size of physical mem of a container?
>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>
>>>>>> Thanks as always!
>>>>>>
>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>
>>>>>>> Hi Sam,
>>>>>>>
>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>> containers due to a 22 GB memory?
>>>>>>>
>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>> Sandy's mentioned in his post.
>>>>>>>
>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>> assigned to containers?
>>>>>>>
>>>>>>> This is a general question. You may use the same process you took to
>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>> change).
>>>>>>>
>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>
>>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>> config
>>>>>>> name) will not apply anymore.
>>>>>>>
>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>> wrote:
>>>>>>> > Hi Harsh,
>>>>>>> >
>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>> setting, and
>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>> then, the
>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>> >
>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>> containers due
>>>>>>> > to a 22 GB memory?
>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>> assigned to
>>>>>>> > containers?
>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>> 'yarn', will
>>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>>> Like
>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>> >
>>>>>>> > Thanks!
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>> >>
>>>>>>> >> Hey Sam,
>>>>>>> >>
>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>>>> for
>>>>>>> >> both test runs.
>>>>>>> >>
>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>> sandy.ryza@cloudera.com>
>>>>>>> >> wrote:
>>>>>>> >> > Hey Sam,
>>>>>>> >> >
>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>> what's
>>>>>>> >> > causing the difference.
>>>>>>> >> >
>>>>>>> >> > A couple observations:
>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>>>> there
>>>>>>> >> > twice
>>>>>>> >> > with two different values.
>>>>>>> >> >
>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>> default
>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>> tasks getting
>>>>>>> >> > killed for running over resource limits?
>>>>>>> >> >
>>>>>>> >> > -Sandy
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>> >> >>
>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>> mins from
>>>>>>> >> >> 33%
>>>>>>> >> >> to 35% as below.
>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>> >> >>
>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>> Thanks!
>>>>>>> >> >> (A) core-site.xml
>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>> >> >>
>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>> >> >>     <value>1</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>> >> >>     <value>64</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>> >> >>     <value>10</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> (C) mapred-site.xml
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>> >> >>     <description>No description</description>
>>>>>>> >> >>     <final>true</final>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>> >> >>     <description>No description</description>
>>>>>>> >> >>     <final>true</final>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>> >> >> </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>> >> >>     <value>yarn</value>
>>>>>>> >> >>    </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>> >> >>     <value>8</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>> >> >>     <value>4</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>> >> >>     <value>true</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> (D) yarn-site.xml
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>> and
>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>> Resource
>>>>>>> >> >> Manager.
>>>>>>> >> >>     </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <description>The address of the RM web
>>>>>>> application.</description>
>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>> and port
>>>>>>> >> >> is
>>>>>>> >> >> the port
>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>> Resource
>>>>>>> >> >> Manager.
>>>>>>> >> >>     </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>> ResourceManager and
>>>>>>> >> >> the
>>>>>>> >> >> port is the port on
>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>> </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>> >> >>     <description>the local directories used by the
>>>>>>> >> >> nodemanager</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>> port</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>> >> >>     <value>10240</value>
>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>> >> >> GB</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>> are moved
>>>>>>> >> >> to
>>>>>>> >> >> </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>    <property>
>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>> >> >> directories</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>> Reduce to
>>>>>>> >> >> run </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>> >> >>     <value>64</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>> >> >>     <value>24</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>> >> >>     <value>3</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>> >> >>     <value>22000</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>> >> >>     <value>2.1</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>> >> >>>
>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>> resource
>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>> minimum by
>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>> GB NM
>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>> configs as at
>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>> >> >>>
>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>>>> to not
>>>>>>> >> >>> care about such things :)
>>>>>>> >> >>>
>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>> samliuhadoop@gmail.com>
>>>>>>> >> >>> wrote:
>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>> MRv1 and
>>>>>>> >> >>> > Yarn.
>>>>>>> >> >>> > But
>>>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>>>> did not
>>>>>>> >> >>> > tune
>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>> After
>>>>>>> >> >>> > analyzing
>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>> comparison
>>>>>>> >> >>> > again.
>>>>>>> >> >>> >
>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>> compare
>>>>>>> >> >>> > with
>>>>>>> >> >>> > MRv1,
>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>> Reduce
>>>>>>> >> >>> > phase(terasort test).
>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>> performance
>>>>>>> >> >>> > tunning?
>>>>>>> >> >>> >
>>>>>>> >> >>> > Thanks!
>>>>>>> >> >>> >
>>>>>>> >> >>> >
>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>> marcosluis2186@gmail.com>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Hi Experts,
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>> near
>>>>>>> >> >>> >>> future,
>>>>>>> >> >>> >>> and
>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>> hardware:
>>>>>>> >> >>> >>> 2
>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>> on the
>>>>>>> >> >>> >>> same
>>>>>>> >> >>> >>> env. To
>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>> >> >>> >>> configurations, but
>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>> MRv1, if
>>>>>>> >> >>> >>> they
>>>>>>> >> >>> >>> all
>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>> framework than
>>>>>>> >> >>> >>> MRv1.
>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>> that Yarn
>>>>>>> >> >>> >>> is
>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>> Reduce
>>>>>>> >> >>> >>> phase.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>> >> >>> >>> materials?
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Thanks!
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> --
>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>> >> >>> >>
>>>>>>> >> >>> >
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> --
>>>>>>> >> >>> Harsh J
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Harsh J
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Harsh J
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

How many map and reduce slots are you using per tasktracker in MR1?  How do
the average map times compare? (MR2 reports this directly on the web UI,
but you can also get a sense in MR1 by scrolling through the map tasks
page).  Can you share the counters for MR1?

-Sandy


On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang
<ji...@gmail.com>wrote:

> Unfortunately, turning off JVM reuse still got the same result, i.e.,
> about 90 minutes for MR2. I don't think the killed reduces could contribute
> to 2 times slowness. There should be something very wrong either in
> configuration or code. Any hints?
>
>
>
> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>
>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>
>>
>> 2013-10-20 03:13:58,751 ERROR [main]
>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>> No lease on
>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>> File does not exist. Holder
>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>> any open files.
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>         at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>> --
>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>         at
>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>         at
>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>         at
>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>         at
>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>         at
>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
>> Exception running child : java.nio.channels.ClosedChannelException
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>         at
>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>         at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>         at
>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>         at
>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>         at
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>         at
>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>         at
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>         at
>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>
>>
>>
>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>>
>>> It looks like many of your reduce tasks were killed.  Do you know why?
>>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>>> MR1 with JVM reuse turned off.
>>>
>>> -Sandy
>>>
>>>
>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The Terasort output for MR2 is as follows.
>>>>
>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>>> Counters: 46
>>>>         File System Counters
>>>>                 FILE: Number of bytes read=456102049355
>>>>                 FILE: Number of bytes written=897246250517
>>>>                 FILE: Number of read operations=0
>>>>                 FILE: Number of large read operations=0
>>>>                 FILE: Number of write operations=0
>>>>                 HDFS: Number of bytes read=1000000851200
>>>>                 HDFS: Number of bytes written=1000000000000
>>>>                 HDFS: Number of read operations=32131
>>>>                 HDFS: Number of large read operations=0
>>>>                 HDFS: Number of write operations=224
>>>>         Job Counters
>>>>                 Killed map tasks=1
>>>>                 Killed reduce tasks=20
>>>>                 Launched map tasks=7601
>>>>                 Launched reduce tasks=132
>>>>                 Data-local map tasks=7591
>>>>                 Rack-local map tasks=10
>>>>                 Total time spent by all maps in occupied slots
>>>> (ms)=1696141311
>>>>                 Total time spent by all reduces in occupied slots
>>>> (ms)=2664045096
>>>>         Map-Reduce Framework
>>>>                 Map input records=10000000000
>>>>                 Map output records=10000000000
>>>>                 Map output bytes=1020000000000
>>>>                 Map output materialized bytes=440486356802
>>>>                 Input split bytes=851200
>>>>                 Combine input records=0
>>>>                 Combine output records=0
>>>>                 Reduce input groups=10000000000
>>>>                 Reduce shuffle bytes=440486356802
>>>>                 Reduce input records=10000000000
>>>>                 Reduce output records=10000000000
>>>>                 Spilled Records=20000000000
>>>>                 Shuffled Maps =851200
>>>>                 Failed Shuffles=61
>>>>                 Merged Map outputs=851200
>>>>                 GC time elapsed (ms)=4215666
>>>>                 CPU time spent (ms)=192433000
>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>         Shuffle Errors
>>>>                 BAD_ID=0
>>>>                 CONNECTION=0
>>>>                 IO_ERROR=4
>>>>                 WRONG_LENGTH=0
>>>>                 WRONG_MAP=0
>>>>                 WRONG_REDUCE=0
>>>>         File Input Format Counters
>>>>                 Bytes Read=1000000000000
>>>>         File Output Format Counters
>>>>                 Bytes Written=1000000000000
>>>>
>>>> Thanks,
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>>> I cannot really believe it.
>>>>>
>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>>> configurations are as follows.
>>>>>
>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>         mapreduce.map.memory.mb = "768";
>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>
>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>
>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>         mapreduce.map.speculative = "true";
>>>>>         mapreduce.reduce.speculative = "true";
>>>>>         mapreduce.framework.name = "yarn";
>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>
>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>         mapreduce.map.output.compress = "true";
>>>>>         mapreduce.map.output.compress.codec =
>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>
>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>>> "64";
>>>>>         yarn.resourcemanager.scheduler.class =
>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>         yarn.nodemanager.container-executor.class =
>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>
>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>>> it took around 90 minutes in MR2.
>>>>>
>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>> configurations for terasort to work better in MR2?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>> michael_segel@hotmail.com> wrote:
>>>>>
>>>>>> Sam,
>>>>>> I think your cluster is too small for any meaningful conclusions to
>>>>>> be made.
>>>>>>
>>>>>>
>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>
>>>>>> Mike Segel
>>>>>>
>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Harsh,
>>>>>>
>>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>>> cluster improved a lot after increasing the reducer
>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>
>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>>> a job?
>>>>>>
>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?
>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>> size of physical mem of a container?
>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?
>>>>>>
>>>>>> Thanks as always!
>>>>>>
>>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>>
>>>>>>> Hi Sam,
>>>>>>>
>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>> containers due to a 22 GB memory?
>>>>>>>
>>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>> Sandy's mentioned in his post.
>>>>>>>
>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>> assigned to containers?
>>>>>>>
>>>>>>> This is a general question. You may use the same process you took to
>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>>>> lower NM's published memory resources to control the same (config
>>>>>>> change).
>>>>>>>
>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>
>>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>> config
>>>>>>> name) will not apply anymore.
>>>>>>>
>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>>> wrote:
>>>>>>> > Hi Harsh,
>>>>>>> >
>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>> setting, and
>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>> then, the
>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>> >
>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>> containers due
>>>>>>> > to a 22 GB memory?
>>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>>> assigned to
>>>>>>> > containers?
>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>>> 'yarn', will
>>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>>> Like
>>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>>> >
>>>>>>> > Thanks!
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>>> >>
>>>>>>> >> Hey Sam,
>>>>>>> >>
>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>>>> for
>>>>>>> >> both test runs.
>>>>>>> >>
>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>> sandy.ryza@cloudera.com>
>>>>>>> >> wrote:
>>>>>>> >> > Hey Sam,
>>>>>>> >> >
>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>>> what's
>>>>>>> >> > causing the difference.
>>>>>>> >> >
>>>>>>> >> > A couple observations:
>>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>>>> there
>>>>>>> >> > twice
>>>>>>> >> > with two different values.
>>>>>>> >> >
>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>>> default
>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>> tasks getting
>>>>>>> >> > killed for running over resource limits?
>>>>>>> >> >
>>>>>>> >> > -Sandy
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>> >> >>
>>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>>> mins from
>>>>>>> >> >> 33%
>>>>>>> >> >> to 35% as below.
>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>> >> >>
>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>> Thanks!
>>>>>>> >> >> (A) core-site.xml
>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>> >> >>
>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>> >> >>     <value>1</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>> >> >>     <value>64</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>> >> >>     <value>10</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> (C) mapred-site.xml
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>> >> >>     <description>No description</description>
>>>>>>> >> >>     <final>true</final>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>> >> >>     <description>No description</description>
>>>>>>> >> >>     <final>true</final>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>> >> >> </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>> >> >>     <value>yarn</value>
>>>>>>> >> >>    </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>> >> >>     <value>8</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>> >> >>     <value>4</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>> >> >>     <value>true</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> (D) yarn-site.xml
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>>> and
>>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>>> Resource
>>>>>>> >> >> Manager.
>>>>>>> >> >>     </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <description>The address of the RM web
>>>>>>> application.</description>
>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>>> and port
>>>>>>> >> >> is
>>>>>>> >> >> the port
>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>> Resource
>>>>>>> >> >> Manager.
>>>>>>> >> >>     </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>> ResourceManager and
>>>>>>> >> >> the
>>>>>>> >> >> port is the port on
>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>> </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>> >> >>     <description>the local directories used by the
>>>>>>> >> >> nodemanager</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>> port</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>> >> >>     <value>10240</value>
>>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>>> >> >> GB</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>>> are moved
>>>>>>> >> >> to
>>>>>>> >> >> </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>    <property>
>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>> >> >>
>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>>> >> >> directories</description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>>> Reduce to
>>>>>>> >> >> run </description>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>   <property>
>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>> >> >>     <value>64</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>> >> >>     <value>24</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >> <property>
>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>> >> >>     <value>3</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>> >> >>     <value>22000</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>  <property>
>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>> >> >>     <value>2.1</value>
>>>>>>> >> >>   </property>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>>> >> >>>
>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>>> resource
>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>> minimum by
>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>>> GB NM
>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>> configs as at
>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>> >> >>>
>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>>>> to not
>>>>>>> >> >>> care about such things :)
>>>>>>> >> >>>
>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>> samliuhadoop@gmail.com>
>>>>>>> >> >>> wrote:
>>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>>> MRv1 and
>>>>>>> >> >>> > Yarn.
>>>>>>> >> >>> > But
>>>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>>>> did not
>>>>>>> >> >>> > tune
>>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>>> After
>>>>>>> >> >>> > analyzing
>>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>>> comparison
>>>>>>> >> >>> > again.
>>>>>>> >> >>> >
>>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>>> compare
>>>>>>> >> >>> > with
>>>>>>> >> >>> > MRv1,
>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>>> Reduce
>>>>>>> >> >>> > phase(terasort test).
>>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>>> performance
>>>>>>> >> >>> > tunning?
>>>>>>> >> >>> >
>>>>>>> >> >>> > Thanks!
>>>>>>> >> >>> >
>>>>>>> >> >>> >
>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>> marcosluis2186@gmail.com>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Hi Experts,
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>>> near
>>>>>>> >> >>> >>> future,
>>>>>>> >> >>> >>> and
>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>>> hardware:
>>>>>>> >> >>> >>> 2
>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set
>>>>>>> on the
>>>>>>> >> >>> >>> same
>>>>>>> >> >>> >>> env. To
>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>>> >> >>> >>> configurations, but
>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>>> MRv1, if
>>>>>>> >> >>> >>> they
>>>>>>> >> >>> >>> all
>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>> framework than
>>>>>>> >> >>> >>> MRv1.
>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>>> that Yarn
>>>>>>> >> >>> >>> is
>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>>> Reduce
>>>>>>> >> >>> >>> phase.
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>>> >> >>> >>> materials?
>>>>>>> >> >>> >>>
>>>>>>> >> >>> >>> Thanks!
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >>
>>>>>>> >> >>> >> --
>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>> >> >>> >>
>>>>>>> >> >>> >
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> --
>>>>>>> >> >>> Harsh J
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Harsh J
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Harsh J
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Unfortunately, turning off JVM reuse still got the same result, i.e., about
90 minutes for MR2. I don't think the killed reduces could contribute to 2
times slowness. There should be something very wrong either in
configuration or code. Any hints?



On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <ji...@gmail.com>wrote:

> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>
> Yes, I saw quite some exceptions in the task attempts. For instance.
>
>
> 2013-10-20 03:13:58,751 ERROR [main]
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
> 2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
> Failed to close file
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
> File does not exist. Holder
> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
> any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> --
>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>         at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>         at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>         at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.nio.channels.ClosedChannelException
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>         at
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>         at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>         at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>         at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>         at
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>
>
>
> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> It looks like many of your reduce tasks were killed.  Do you know why?
>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>> MR1 with JVM reuse turned off.
>>
>> -Sandy
>>
>>
>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The Terasort output for MR2 is as follows.
>>>
>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=456102049355
>>>                 FILE: Number of bytes written=897246250517
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000851200
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=32131
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=224
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=20
>>>                 Launched map tasks=7601
>>>                 Launched reduce tasks=132
>>>                 Data-local map tasks=7591
>>>                 Rack-local map tasks=10
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1696141311
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=2664045096
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440486356802
>>>                 Input split bytes=851200
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440486356802
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =851200
>>>                 Failed Shuffles=61
>>>                 Merged Map outputs=851200
>>>                 GC time elapsed (ms)=4215666
>>>                 CPU time spent (ms)=192433000
>>>                 Physical memory (bytes) snapshot=3349356380160
>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>                 Total committed heap usage (bytes)=3636854259712
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=4
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>> I cannot really believe it.
>>>>
>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>> configurations are as follows.
>>>>
>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>         mapreduce.map.memory.mb = "768";
>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>
>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>
>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>         mapreduce.task.io.sort.factor = "48";
>>>>         mapreduce.task.io.sort.mb = "200";
>>>>         mapreduce.map.speculative = "true";
>>>>         mapreduce.reduce.speculative = "true";
>>>>         mapreduce.framework.name = "yarn";
>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>         mapreduce.map.cpu.vcores = "1";
>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>
>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>         mapreduce.map.output.compress = "true";
>>>>         mapreduce.map.output.compress.codec =
>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>
>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>> "64";
>>>>         yarn.resourcemanager.scheduler.class =
>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>         yarn.nodemanager.container-executor.class =
>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>
>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>> it took around 90 minutes in MR2.
>>>>
>>>> Does anyone know what is wrong here? Or do I need some special
>>>> configurations for terasort to work better in MR2?
>>>>
>>>> Thanks in advance,
>>>>
>>>> John
>>>>
>>>>
>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> Sam,
>>>>> I think your cluster is too small for any meaningful conclusions to be
>>>>> made.
>>>>>
>>>>>
>>>>> Sent from a remote device. Please excuse any typos...
>>>>>
>>>>> Mike Segel
>>>>>
>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>>
>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>> cluster improved a lot after increasing the reducer
>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>>> of physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?
>>>>>
>>>>> Thanks as always!
>>>>>
>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>
>>>>>> Hi Sam,
>>>>>>
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due to a 22 GB memory?
>>>>>>
>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>> Sandy's mentioned in his post.
>>>>>>
>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>> assigned to containers?
>>>>>>
>>>>>> This is a general question. You may use the same process you took to
>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>>> lower NM's published memory resources to control the same (config
>>>>>> change).
>>>>>>
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>
>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>>> name) will not apply anymore.
>>>>>>
>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Harsh,
>>>>>> >
>>>>>> > According to above suggestions, I removed the duplication of
>>>>>> setting, and
>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>>> the
>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>> >
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due
>>>>>> > to a 22 GB memory?
>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>> assigned to
>>>>>> > containers?
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>> 'yarn', will
>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>> Like
>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>> >
>>>>>> > Thanks!
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>> >>
>>>>>> >> Hey Sam,
>>>>>> >>
>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>>> for
>>>>>> >> both test runs.
>>>>>> >>
>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>> sandy.ryza@cloudera.com>
>>>>>> >> wrote:
>>>>>> >> > Hey Sam,
>>>>>> >> >
>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>> what's
>>>>>> >> > causing the difference.
>>>>>> >> >
>>>>>> >> > A couple observations:
>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>>> there
>>>>>> >> > twice
>>>>>> >> > with two different values.
>>>>>> >> >
>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>> default
>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>>>> getting
>>>>>> >> > killed for running over resource limits?
>>>>>> >> >
>>>>>> >> > -Sandy
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>> mins from
>>>>>> >> >> 33%
>>>>>> >> >> to 35% as below.
>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>> >> >>
>>>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>>>> >> >> (A) core-site.xml
>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>> >> >>
>>>>>> >> >> (B) hdfs-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.replication</name>
>>>>>> >> >>     <value>1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>> >> >>     <value>10</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (C) mapred-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>> >> >> </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>> >> >>     <value>yarn</value>
>>>>>> >> >>    </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>> >> >>     <value>8</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>> >> >>     <value>4</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>> >> >>     <value>true</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (D) yarn-site.xml
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>> >> >>     <value>node1:18025</value>
>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>> and
>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <description>The address of the RM web
>>>>>> application.</description>
>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>> >> >>     <value>node1:18088</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>> >> >>     <value>node1:18030</value>
>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>> and port
>>>>>> >> >> is
>>>>>> >> >> the port
>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>> >> >>     <value>node1:18040</value>
>>>>>> >> >>     <description>the host is the hostname of the
>>>>>> ResourceManager and
>>>>>> >> >> the
>>>>>> >> >> port is the port on
>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>> >> >>     <description>the local directories used by the
>>>>>> >> >> nodemanager</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>> port</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>10240</value>
>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>> >> >> GB</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>> are moved
>>>>>> >> >> to
>>>>>> >> >> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>    <property>
>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>> >> >> directories</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>> Reduce to
>>>>>> >> >> run </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>> >> >>     <value>24</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>> >> >>     <value>3</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>22000</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>> >> >>     <value>2.1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>> >> >>>
>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>> resource
>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>> minimum by
>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>> GB NM
>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>> configs as at
>>>>>> >> >>> this point none of us can tell what it is.
>>>>>> >> >>>
>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>>> to not
>>>>>> >> >>> care about such things :)
>>>>>> >> >>>
>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>> samliuhadoop@gmail.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>> MRv1 and
>>>>>> >> >>> > Yarn.
>>>>>> >> >>> > But
>>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>>> did not
>>>>>> >> >>> > tune
>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>> After
>>>>>> >> >>> > analyzing
>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>> comparison
>>>>>> >> >>> > again.
>>>>>> >> >>> >
>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>> compare
>>>>>> >> >>> > with
>>>>>> >> >>> > MRv1,
>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>> Reduce
>>>>>> >> >>> > phase(terasort test).
>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>> performance
>>>>>> >> >>> > tunning?
>>>>>> >> >>> >
>>>>>> >> >>> > Thanks!
>>>>>> >> >>> >
>>>>>> >> >>> >
>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>> marcosluis2186@gmail.com>
>>>>>> >> >>> >>
>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Hi Experts,
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>> near
>>>>>> >> >>> >>> future,
>>>>>> >> >>> >>> and
>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>> hardware:
>>>>>> >> >>> >>> 2
>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>>>> the
>>>>>> >> >>> >>> same
>>>>>> >> >>> >>> env. To
>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>> >> >>> >>> configurations, but
>>>>>> >> >>> >>> use the default configuration values.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>> MRv1, if
>>>>>> >> >>> >>> they
>>>>>> >> >>> >>> all
>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>> framework than
>>>>>> >> >>> >>> MRv1.
>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>> that Yarn
>>>>>> >> >>> >>> is
>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>> Reduce
>>>>>> >> >>> >>> phase.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Here I have two questions:
>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>> >> >>> >>> materials?
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Thanks!
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> --
>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>> >> >>> >>
>>>>>> >> >>> >
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> --
>>>>>> >> >>> Harsh J
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Harsh J
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Unfortunately, turning off JVM reuse still got the same result, i.e., about
90 minutes for MR2. I don't think the killed reduces could contribute to 2
times slowness. There should be something very wrong either in
configuration or code. Any hints?



On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <ji...@gmail.com>wrote:

> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>
> Yes, I saw quite some exceptions in the task attempts. For instance.
>
>
> 2013-10-20 03:13:58,751 ERROR [main]
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
> 2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
> Failed to close file
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
> File does not exist. Holder
> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
> any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> --
>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>         at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>         at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>         at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.nio.channels.ClosedChannelException
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>         at
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>         at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>         at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>         at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>         at
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>
>
>
> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> It looks like many of your reduce tasks were killed.  Do you know why?
>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>> MR1 with JVM reuse turned off.
>>
>> -Sandy
>>
>>
>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The Terasort output for MR2 is as follows.
>>>
>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=456102049355
>>>                 FILE: Number of bytes written=897246250517
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000851200
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=32131
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=224
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=20
>>>                 Launched map tasks=7601
>>>                 Launched reduce tasks=132
>>>                 Data-local map tasks=7591
>>>                 Rack-local map tasks=10
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1696141311
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=2664045096
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440486356802
>>>                 Input split bytes=851200
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440486356802
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =851200
>>>                 Failed Shuffles=61
>>>                 Merged Map outputs=851200
>>>                 GC time elapsed (ms)=4215666
>>>                 CPU time spent (ms)=192433000
>>>                 Physical memory (bytes) snapshot=3349356380160
>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>                 Total committed heap usage (bytes)=3636854259712
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=4
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>> I cannot really believe it.
>>>>
>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>> configurations are as follows.
>>>>
>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>         mapreduce.map.memory.mb = "768";
>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>
>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>
>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>         mapreduce.task.io.sort.factor = "48";
>>>>         mapreduce.task.io.sort.mb = "200";
>>>>         mapreduce.map.speculative = "true";
>>>>         mapreduce.reduce.speculative = "true";
>>>>         mapreduce.framework.name = "yarn";
>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>         mapreduce.map.cpu.vcores = "1";
>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>
>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>         mapreduce.map.output.compress = "true";
>>>>         mapreduce.map.output.compress.codec =
>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>
>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>> "64";
>>>>         yarn.resourcemanager.scheduler.class =
>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>         yarn.nodemanager.container-executor.class =
>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>
>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>> it took around 90 minutes in MR2.
>>>>
>>>> Does anyone know what is wrong here? Or do I need some special
>>>> configurations for terasort to work better in MR2?
>>>>
>>>> Thanks in advance,
>>>>
>>>> John
>>>>
>>>>
>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> Sam,
>>>>> I think your cluster is too small for any meaningful conclusions to be
>>>>> made.
>>>>>
>>>>>
>>>>> Sent from a remote device. Please excuse any typos...
>>>>>
>>>>> Mike Segel
>>>>>
>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>>
>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>> cluster improved a lot after increasing the reducer
>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>>> of physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?
>>>>>
>>>>> Thanks as always!
>>>>>
>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>
>>>>>> Hi Sam,
>>>>>>
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due to a 22 GB memory?
>>>>>>
>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>> Sandy's mentioned in his post.
>>>>>>
>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>> assigned to containers?
>>>>>>
>>>>>> This is a general question. You may use the same process you took to
>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>>> lower NM's published memory resources to control the same (config
>>>>>> change).
>>>>>>
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>
>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>>> name) will not apply anymore.
>>>>>>
>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Harsh,
>>>>>> >
>>>>>> > According to above suggestions, I removed the duplication of
>>>>>> setting, and
>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>>> the
>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>> >
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due
>>>>>> > to a 22 GB memory?
>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>> assigned to
>>>>>> > containers?
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>> 'yarn', will
>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>> Like
>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>> >
>>>>>> > Thanks!
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>> >>
>>>>>> >> Hey Sam,
>>>>>> >>
>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>>> for
>>>>>> >> both test runs.
>>>>>> >>
>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>> sandy.ryza@cloudera.com>
>>>>>> >> wrote:
>>>>>> >> > Hey Sam,
>>>>>> >> >
>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>> what's
>>>>>> >> > causing the difference.
>>>>>> >> >
>>>>>> >> > A couple observations:
>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>>> there
>>>>>> >> > twice
>>>>>> >> > with two different values.
>>>>>> >> >
>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>> default
>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>>>> getting
>>>>>> >> > killed for running over resource limits?
>>>>>> >> >
>>>>>> >> > -Sandy
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>> mins from
>>>>>> >> >> 33%
>>>>>> >> >> to 35% as below.
>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>> >> >>
>>>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>>>> >> >> (A) core-site.xml
>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>> >> >>
>>>>>> >> >> (B) hdfs-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.replication</name>
>>>>>> >> >>     <value>1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>> >> >>     <value>10</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (C) mapred-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>> >> >> </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>> >> >>     <value>yarn</value>
>>>>>> >> >>    </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>> >> >>     <value>8</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>> >> >>     <value>4</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>> >> >>     <value>true</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (D) yarn-site.xml
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>> >> >>     <value>node1:18025</value>
>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>> and
>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <description>The address of the RM web
>>>>>> application.</description>
>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>> >> >>     <value>node1:18088</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>> >> >>     <value>node1:18030</value>
>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>> and port
>>>>>> >> >> is
>>>>>> >> >> the port
>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>> >> >>     <value>node1:18040</value>
>>>>>> >> >>     <description>the host is the hostname of the
>>>>>> ResourceManager and
>>>>>> >> >> the
>>>>>> >> >> port is the port on
>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>> >> >>     <description>the local directories used by the
>>>>>> >> >> nodemanager</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>> port</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>10240</value>
>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>> >> >> GB</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>> are moved
>>>>>> >> >> to
>>>>>> >> >> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>    <property>
>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>> >> >> directories</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>> Reduce to
>>>>>> >> >> run </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>> >> >>     <value>24</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>> >> >>     <value>3</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>22000</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>> >> >>     <value>2.1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>> >> >>>
>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>> resource
>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>> minimum by
>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>> GB NM
>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>> configs as at
>>>>>> >> >>> this point none of us can tell what it is.
>>>>>> >> >>>
>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>>> to not
>>>>>> >> >>> care about such things :)
>>>>>> >> >>>
>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>> samliuhadoop@gmail.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>> MRv1 and
>>>>>> >> >>> > Yarn.
>>>>>> >> >>> > But
>>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>>> did not
>>>>>> >> >>> > tune
>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>> After
>>>>>> >> >>> > analyzing
>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>> comparison
>>>>>> >> >>> > again.
>>>>>> >> >>> >
>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>> compare
>>>>>> >> >>> > with
>>>>>> >> >>> > MRv1,
>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>> Reduce
>>>>>> >> >>> > phase(terasort test).
>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>> performance
>>>>>> >> >>> > tunning?
>>>>>> >> >>> >
>>>>>> >> >>> > Thanks!
>>>>>> >> >>> >
>>>>>> >> >>> >
>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>> marcosluis2186@gmail.com>
>>>>>> >> >>> >>
>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Hi Experts,
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>> near
>>>>>> >> >>> >>> future,
>>>>>> >> >>> >>> and
>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>> hardware:
>>>>>> >> >>> >>> 2
>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>>>> the
>>>>>> >> >>> >>> same
>>>>>> >> >>> >>> env. To
>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>> >> >>> >>> configurations, but
>>>>>> >> >>> >>> use the default configuration values.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>> MRv1, if
>>>>>> >> >>> >>> they
>>>>>> >> >>> >>> all
>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>> framework than
>>>>>> >> >>> >>> MRv1.
>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>> that Yarn
>>>>>> >> >>> >>> is
>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>> Reduce
>>>>>> >> >>> >>> phase.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Here I have two questions:
>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>> >> >>> >>> materials?
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Thanks!
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> --
>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>> >> >>> >>
>>>>>> >> >>> >
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> --
>>>>>> >> >>> Harsh J
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Harsh J
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Unfortunately, turning off JVM reuse still got the same result, i.e., about
90 minutes for MR2. I don't think the killed reduces could contribute to 2
times slowness. There should be something very wrong either in
configuration or code. Any hints?



On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <ji...@gmail.com>wrote:

> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>
> Yes, I saw quite some exceptions in the task attempts. For instance.
>
>
> 2013-10-20 03:13:58,751 ERROR [main]
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
> 2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
> Failed to close file
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
> File does not exist. Holder
> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
> any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> --
>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>         at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>         at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>         at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.nio.channels.ClosedChannelException
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>         at
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>         at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>         at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>         at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>         at
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>
>
>
> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> It looks like many of your reduce tasks were killed.  Do you know why?
>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>> MR1 with JVM reuse turned off.
>>
>> -Sandy
>>
>>
>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The Terasort output for MR2 is as follows.
>>>
>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=456102049355
>>>                 FILE: Number of bytes written=897246250517
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000851200
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=32131
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=224
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=20
>>>                 Launched map tasks=7601
>>>                 Launched reduce tasks=132
>>>                 Data-local map tasks=7591
>>>                 Rack-local map tasks=10
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1696141311
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=2664045096
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440486356802
>>>                 Input split bytes=851200
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440486356802
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =851200
>>>                 Failed Shuffles=61
>>>                 Merged Map outputs=851200
>>>                 GC time elapsed (ms)=4215666
>>>                 CPU time spent (ms)=192433000
>>>                 Physical memory (bytes) snapshot=3349356380160
>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>                 Total committed heap usage (bytes)=3636854259712
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=4
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>> I cannot really believe it.
>>>>
>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>> configurations are as follows.
>>>>
>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>         mapreduce.map.memory.mb = "768";
>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>
>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>
>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>         mapreduce.task.io.sort.factor = "48";
>>>>         mapreduce.task.io.sort.mb = "200";
>>>>         mapreduce.map.speculative = "true";
>>>>         mapreduce.reduce.speculative = "true";
>>>>         mapreduce.framework.name = "yarn";
>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>         mapreduce.map.cpu.vcores = "1";
>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>
>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>         mapreduce.map.output.compress = "true";
>>>>         mapreduce.map.output.compress.codec =
>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>
>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>> "64";
>>>>         yarn.resourcemanager.scheduler.class =
>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>         yarn.nodemanager.container-executor.class =
>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>
>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>> it took around 90 minutes in MR2.
>>>>
>>>> Does anyone know what is wrong here? Or do I need some special
>>>> configurations for terasort to work better in MR2?
>>>>
>>>> Thanks in advance,
>>>>
>>>> John
>>>>
>>>>
>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> Sam,
>>>>> I think your cluster is too small for any meaningful conclusions to be
>>>>> made.
>>>>>
>>>>>
>>>>> Sent from a remote device. Please excuse any typos...
>>>>>
>>>>> Mike Segel
>>>>>
>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>>
>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>> cluster improved a lot after increasing the reducer
>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>>> of physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?
>>>>>
>>>>> Thanks as always!
>>>>>
>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>
>>>>>> Hi Sam,
>>>>>>
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due to a 22 GB memory?
>>>>>>
>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>> Sandy's mentioned in his post.
>>>>>>
>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>> assigned to containers?
>>>>>>
>>>>>> This is a general question. You may use the same process you took to
>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>>> lower NM's published memory resources to control the same (config
>>>>>> change).
>>>>>>
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>
>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>>> name) will not apply anymore.
>>>>>>
>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Harsh,
>>>>>> >
>>>>>> > According to above suggestions, I removed the duplication of
>>>>>> setting, and
>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>>> the
>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>> >
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due
>>>>>> > to a 22 GB memory?
>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>> assigned to
>>>>>> > containers?
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>> 'yarn', will
>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>> Like
>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>> >
>>>>>> > Thanks!
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>> >>
>>>>>> >> Hey Sam,
>>>>>> >>
>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>>> for
>>>>>> >> both test runs.
>>>>>> >>
>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>> sandy.ryza@cloudera.com>
>>>>>> >> wrote:
>>>>>> >> > Hey Sam,
>>>>>> >> >
>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>> what's
>>>>>> >> > causing the difference.
>>>>>> >> >
>>>>>> >> > A couple observations:
>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>>> there
>>>>>> >> > twice
>>>>>> >> > with two different values.
>>>>>> >> >
>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>> default
>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>>>> getting
>>>>>> >> > killed for running over resource limits?
>>>>>> >> >
>>>>>> >> > -Sandy
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>> mins from
>>>>>> >> >> 33%
>>>>>> >> >> to 35% as below.
>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>> >> >>
>>>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>>>> >> >> (A) core-site.xml
>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>> >> >>
>>>>>> >> >> (B) hdfs-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.replication</name>
>>>>>> >> >>     <value>1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>> >> >>     <value>10</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (C) mapred-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>> >> >> </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>> >> >>     <value>yarn</value>
>>>>>> >> >>    </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>> >> >>     <value>8</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>> >> >>     <value>4</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>> >> >>     <value>true</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (D) yarn-site.xml
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>> >> >>     <value>node1:18025</value>
>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>> and
>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <description>The address of the RM web
>>>>>> application.</description>
>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>> >> >>     <value>node1:18088</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>> >> >>     <value>node1:18030</value>
>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>> and port
>>>>>> >> >> is
>>>>>> >> >> the port
>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>> >> >>     <value>node1:18040</value>
>>>>>> >> >>     <description>the host is the hostname of the
>>>>>> ResourceManager and
>>>>>> >> >> the
>>>>>> >> >> port is the port on
>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>> >> >>     <description>the local directories used by the
>>>>>> >> >> nodemanager</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>> port</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>10240</value>
>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>> >> >> GB</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>> are moved
>>>>>> >> >> to
>>>>>> >> >> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>    <property>
>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>> >> >> directories</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>> Reduce to
>>>>>> >> >> run </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>> >> >>     <value>24</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>> >> >>     <value>3</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>22000</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>> >> >>     <value>2.1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>> >> >>>
>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>> resource
>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>> minimum by
>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>> GB NM
>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>> configs as at
>>>>>> >> >>> this point none of us can tell what it is.
>>>>>> >> >>>
>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>>> to not
>>>>>> >> >>> care about such things :)
>>>>>> >> >>>
>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>> samliuhadoop@gmail.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>> MRv1 and
>>>>>> >> >>> > Yarn.
>>>>>> >> >>> > But
>>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>>> did not
>>>>>> >> >>> > tune
>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>> After
>>>>>> >> >>> > analyzing
>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>> comparison
>>>>>> >> >>> > again.
>>>>>> >> >>> >
>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>> compare
>>>>>> >> >>> > with
>>>>>> >> >>> > MRv1,
>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>> Reduce
>>>>>> >> >>> > phase(terasort test).
>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>> performance
>>>>>> >> >>> > tunning?
>>>>>> >> >>> >
>>>>>> >> >>> > Thanks!
>>>>>> >> >>> >
>>>>>> >> >>> >
>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>> marcosluis2186@gmail.com>
>>>>>> >> >>> >>
>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Hi Experts,
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>> near
>>>>>> >> >>> >>> future,
>>>>>> >> >>> >>> and
>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>> hardware:
>>>>>> >> >>> >>> 2
>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>>>> the
>>>>>> >> >>> >>> same
>>>>>> >> >>> >>> env. To
>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>> >> >>> >>> configurations, but
>>>>>> >> >>> >>> use the default configuration values.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>> MRv1, if
>>>>>> >> >>> >>> they
>>>>>> >> >>> >>> all
>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>> framework than
>>>>>> >> >>> >>> MRv1.
>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>> that Yarn
>>>>>> >> >>> >>> is
>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>> Reduce
>>>>>> >> >>> >>> phase.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Here I have two questions:
>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>> >> >>> >>> materials?
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Thanks!
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> --
>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>> >> >>> >>
>>>>>> >> >>> >
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> --
>>>>>> >> >>> Harsh J
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Harsh J
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Unfortunately, turning off JVM reuse still got the same result, i.e., about
90 minutes for MR2. I don't think the killed reduces could contribute to 2
times slowness. There should be something very wrong either in
configuration or code. Any hints?



On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <ji...@gmail.com>wrote:

> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>
> Yes, I saw quite some exceptions in the task attempts. For instance.
>
>
> 2013-10-20 03:13:58,751 ERROR [main]
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
> 2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
> Failed to close file
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
> File does not exist. Holder
> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
> any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> --
>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>         at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>         at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>         at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.nio.channels.ClosedChannelException
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>         at
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>         at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>         at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>         at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>         at
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>
>
>
> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com>wrote:
>
>> It looks like many of your reduce tasks were killed.  Do you know why?
>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>> MR1 with JVM reuse turned off.
>>
>> -Sandy
>>
>>
>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The Terasort output for MR2 is as follows.
>>>
>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=456102049355
>>>                 FILE: Number of bytes written=897246250517
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000851200
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=32131
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=224
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=20
>>>                 Launched map tasks=7601
>>>                 Launched reduce tasks=132
>>>                 Data-local map tasks=7591
>>>                 Rack-local map tasks=10
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1696141311
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=2664045096
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440486356802
>>>                 Input split bytes=851200
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440486356802
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =851200
>>>                 Failed Shuffles=61
>>>                 Merged Map outputs=851200
>>>                 GC time elapsed (ms)=4215666
>>>                 CPU time spent (ms)=192433000
>>>                 Physical memory (bytes) snapshot=3349356380160
>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>                 Total committed heap usage (bytes)=3636854259712
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=4
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>> I cannot really believe it.
>>>>
>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>> configurations are as follows.
>>>>
>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>         mapreduce.map.memory.mb = "768";
>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>
>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>
>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>         mapreduce.task.io.sort.factor = "48";
>>>>         mapreduce.task.io.sort.mb = "200";
>>>>         mapreduce.map.speculative = "true";
>>>>         mapreduce.reduce.speculative = "true";
>>>>         mapreduce.framework.name = "yarn";
>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>         mapreduce.map.cpu.vcores = "1";
>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>
>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>         mapreduce.map.output.compress = "true";
>>>>         mapreduce.map.output.compress.codec =
>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>
>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>> "64";
>>>>         yarn.resourcemanager.scheduler.class =
>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>         yarn.nodemanager.container-executor.class =
>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>
>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>> it took around 90 minutes in MR2.
>>>>
>>>> Does anyone know what is wrong here? Or do I need some special
>>>> configurations for terasort to work better in MR2?
>>>>
>>>> Thanks in advance,
>>>>
>>>> John
>>>>
>>>>
>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> Sam,
>>>>> I think your cluster is too small for any meaningful conclusions to be
>>>>> made.
>>>>>
>>>>>
>>>>> Sent from a remote device. Please excuse any typos...
>>>>>
>>>>> Mike Segel
>>>>>
>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>>
>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>> cluster improved a lot after increasing the reducer
>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>>> of physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?
>>>>>
>>>>> Thanks as always!
>>>>>
>>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>>
>>>>>> Hi Sam,
>>>>>>
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due to a 22 GB memory?
>>>>>>
>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>> Sandy's mentioned in his post.
>>>>>>
>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>> assigned to containers?
>>>>>>
>>>>>> This is a general question. You may use the same process you took to
>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>>> lower NM's published memory resources to control the same (config
>>>>>> change).
>>>>>>
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>
>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>>> name) will not apply anymore.
>>>>>>
>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Harsh,
>>>>>> >
>>>>>> > According to above suggestions, I removed the duplication of
>>>>>> setting, and
>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>>> the
>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>> >
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due
>>>>>> > to a 22 GB memory?
>>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>>> assigned to
>>>>>> > containers?
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>>> 'yarn', will
>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>> Like
>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>> >
>>>>>> > Thanks!
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>>> >>
>>>>>> >> Hey Sam,
>>>>>> >>
>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>>> for
>>>>>> >> both test runs.
>>>>>> >>
>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>> sandy.ryza@cloudera.com>
>>>>>> >> wrote:
>>>>>> >> > Hey Sam,
>>>>>> >> >
>>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>>> what's
>>>>>> >> > causing the difference.
>>>>>> >> >
>>>>>> >> > A couple observations:
>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>>> there
>>>>>> >> > twice
>>>>>> >> > with two different values.
>>>>>> >> >
>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>>> default
>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>>>> getting
>>>>>> >> > killed for running over resource limits?
>>>>>> >> >
>>>>>> >> > -Sandy
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>>> mins from
>>>>>> >> >> 33%
>>>>>> >> >> to 35% as below.
>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>> >> >>
>>>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>>>> >> >> (A) core-site.xml
>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>> >> >>
>>>>>> >> >> (B) hdfs-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.replication</name>
>>>>>> >> >>     <value>1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>> >> >>     <value>10</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (C) mapred-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>> >> >> </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>> >> >>     <value>yarn</value>
>>>>>> >> >>    </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>> >> >>     <value>8</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>> >> >>     <value>4</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>> >> >>     <value>true</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (D) yarn-site.xml
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>> >> >>     <value>node1:18025</value>
>>>>>> >> >>     <description>host is the hostname of the resource manager
>>>>>> and
>>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <description>The address of the RM web
>>>>>> application.</description>
>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>> >> >>     <value>node1:18088</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>> >> >>     <value>node1:18030</value>
>>>>>> >> >>     <description>host is the hostname of the resourcemanager
>>>>>> and port
>>>>>> >> >> is
>>>>>> >> >> the port
>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>> >> >>     <value>node1:18040</value>
>>>>>> >> >>     <description>the host is the hostname of the
>>>>>> ResourceManager and
>>>>>> >> >> the
>>>>>> >> >> port is the port on
>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>> >> >>     <description>the local directories used by the
>>>>>> >> >> nodemanager</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>> port</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>10240</value>
>>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>>> >> >> GB</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>>> are moved
>>>>>> >> >> to
>>>>>> >> >> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>    <property>
>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>>> >> >> directories</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>>> Reduce to
>>>>>> >> >> run </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>> >> >>     <value>24</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>> >> >>     <value>3</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>22000</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>> >> >>     <value>2.1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>>> >> >>>
>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>>> resource
>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>> minimum by
>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8
>>>>>> GB NM
>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>> configs as at
>>>>>> >> >>> this point none of us can tell what it is.
>>>>>> >> >>>
>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>>> to not
>>>>>> >> >>> care about such things :)
>>>>>> >> >>>
>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>> samliuhadoop@gmail.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>> > At the begining, I just want to do a fast comparision of
>>>>>> MRv1 and
>>>>>> >> >>> > Yarn.
>>>>>> >> >>> > But
>>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>>> did not
>>>>>> >> >>> > tune
>>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>>> After
>>>>>> >> >>> > analyzing
>>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>>> comparison
>>>>>> >> >>> > again.
>>>>>> >> >>> >
>>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>>> compare
>>>>>> >> >>> > with
>>>>>> >> >>> > MRv1,
>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on
>>>>>> Reduce
>>>>>> >> >>> > phase(terasort test).
>>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>>> performance
>>>>>> >> >>> > tunning?
>>>>>> >> >>> >
>>>>>> >> >>> > Thanks!
>>>>>> >> >>> >
>>>>>> >> >>> >
>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>> marcosluis2186@gmail.com>
>>>>>> >> >>> >>
>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Hi Experts,
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the
>>>>>> near
>>>>>> >> >>> >>> future,
>>>>>> >> >>> >>> and
>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>>> hardware:
>>>>>> >> >>> >>> 2
>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>>>> the
>>>>>> >> >>> >>> same
>>>>>> >> >>> >>> env. To
>>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>>> >> >>> >>> configurations, but
>>>>>> >> >>> >>> use the default configuration values.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Before testing, I think Yarn will be much better than
>>>>>> MRv1, if
>>>>>> >> >>> >>> they
>>>>>> >> >>> >>> all
>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>> framework than
>>>>>> >> >>> >>> MRv1.
>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>>> that Yarn
>>>>>> >> >>> >>> is
>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on
>>>>>> Reduce
>>>>>> >> >>> >>> phase.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Here I have two questions:
>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>>> >> >>> >>> materials?
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Thanks!
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> --
>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>> >> >>> >>
>>>>>> >> >>> >
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> --
>>>>>> >> >>> Harsh J
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Harsh J
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Thanks Sandy. I will try to turn JVM resue off and see what happens.

Yes, I saw quite some exceptions in the task attempts. For instance.


2013-10-20 03:13:58,751 ERROR [main]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
Failed to close file
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
File does not exist. Holder
DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
any open files.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
--
        at com.sun.proxy.$Proxy10.complete(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
        at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
        at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
        at
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
        at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
        at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
        at
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
        at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.nio.channels.ClosedChannelException
        at
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
        at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
        at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at
org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)



On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> It looks like many of your reduce tasks were killed.  Do you know why?
>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
> MR1 with JVM reuse turned off.
>
> -Sandy
>
>
> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> The Terasort output for MR2 is as follows.
>>
>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=456102049355
>>                 FILE: Number of bytes written=897246250517
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000851200
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=32131
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=224
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=20
>>                 Launched map tasks=7601
>>                 Launched reduce tasks=132
>>                 Data-local map tasks=7591
>>                 Rack-local map tasks=10
>>                 Total time spent by all maps in occupied slots
>> (ms)=1696141311
>>                 Total time spent by all reduces in occupied slots
>> (ms)=2664045096
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440486356802
>>                 Input split bytes=851200
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440486356802
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =851200
>>                 Failed Shuffles=61
>>                 Merged Map outputs=851200
>>                 GC time elapsed (ms)=4215666
>>                 CPU time spent (ms)=192433000
>>                 Physical memory (bytes) snapshot=3349356380160
>>                 Virtual memory (bytes) snapshot=9665208745984
>>                 Total committed heap usage (bytes)=3636854259712
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=4
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>> Thanks,
>>
>> John
>>
>>
>>
>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>> I cannot really believe it.
>>>
>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>> configurations are as follows.
>>>
>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>         mapreduce.map.memory.mb = "768";
>>>         mapreduce.reduce.memory.mb = "2048";
>>>
>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>
>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>         mapreduce.task.io.sort.factor = "48";
>>>         mapreduce.task.io.sort.mb = "200";
>>>         mapreduce.map.speculative = "true";
>>>         mapreduce.reduce.speculative = "true";
>>>         mapreduce.framework.name = "yarn";
>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>         mapreduce.map.cpu.vcores = "1";
>>>         mapreduce.reduce.cpu.vcores = "2";
>>>
>>>         mapreduce.job.jvm.numtasks = "20";
>>>         mapreduce.map.output.compress = "true";
>>>         mapreduce.map.output.compress.codec =
>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>
>>>         yarn.resourcemanager.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.class =
>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>         yarn.nodemanager.container-executor.class =
>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>
>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>> it took around 90 minutes in MR2.
>>>
>>> Does anyone know what is wrong here? Or do I need some special
>>> configurations for terasort to work better in MR2?
>>>
>>> Thanks in advance,
>>>
>>> John
>>>
>>>
>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> Sam,
>>>> I think your cluster is too small for any meaningful conclusions to be
>>>> made.
>>>>
>>>>
>>>> Sent from a remote device. Please excuse any typos...
>>>>
>>>> Mike Segel
>>>>
>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>
>>>> Hi Harsh,
>>>>
>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>> cluster improved a lot after increasing the reducer
>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>> questions about the way of Yarn to execute MRv1 job:
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>> of physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?
>>>>
>>>> Thanks as always!
>>>>
>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>
>>>>> Hi Sam,
>>>>>
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due to a 22 GB memory?
>>>>>
>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>> runs the job, additionally. This is tunable using the properties
>>>>> Sandy's mentioned in his post.
>>>>>
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to containers?
>>>>>
>>>>> This is a general question. You may use the same process you took to
>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>> lower NM's published memory resources to control the same (config
>>>>> change).
>>>>>
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>
>>>>> Yes, all of these properties will still work. Old properties specific
>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>> name) will not apply anymore.
>>>>>
>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>> wrote:
>>>>> > Hi Harsh,
>>>>> >
>>>>> > According to above suggestions, I removed the duplication of
>>>>> setting, and
>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>> the
>>>>> > efficiency improved about 18%.  I have questions:
>>>>> >
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due
>>>>> > to a 22 GB memory?
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to
>>>>> > containers?
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will
>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>> Like
>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>> >
>>>>> > Thanks!
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>> >>
>>>>> >> Hey Sam,
>>>>> >>
>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>> for
>>>>> >> both test runs.
>>>>> >>
>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>> sandy.ryza@cloudera.com>
>>>>> >> wrote:
>>>>> >> > Hey Sam,
>>>>> >> >
>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>> what's
>>>>> >> > causing the difference.
>>>>> >> >
>>>>> >> > A couple observations:
>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>> there
>>>>> >> > twice
>>>>> >> > with two different values.
>>>>> >> >
>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>> default
>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>>> getting
>>>>> >> > killed for running over resource limits?
>>>>> >> >
>>>>> >> > -Sandy
>>>>> >> >
>>>>> >> >
>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>> mins from
>>>>> >> >> 33%
>>>>> >> >> to 35% as below.
>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>> >> >>
>>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>>> >> >> (A) core-site.xml
>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>> >> >>
>>>>> >> >> (B) hdfs-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.replication</name>
>>>>> >> >>     <value>1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.name.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.data.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.block.size</name>
>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>> >> >>     <value>10</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (C) mapred-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>> >> >>   <value>-Xmx1000m</value>
>>>>> >> >> </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>> >> >>     <value>yarn</value>
>>>>> >> >>    </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>> >> >>     <value>8</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>> >> >>     <value>4</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>> >> >>     <value>true</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (D) yarn-site.xml
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>> >> >>     <value>node1:18025</value>
>>>>> >> >>     <description>host is the hostname of the resource manager and
>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>> Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <description>The address of the RM web
>>>>> application.</description>
>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>> >> >>     <value>node1:18088</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>> >> >>     <value>node1:18030</value>
>>>>> >> >>     <description>host is the hostname of the resourcemanager and
>>>>> port
>>>>> >> >> is
>>>>> >> >> the port
>>>>> >> >>     on which the Applications in the cluster talk to the Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>> >> >>     <value>node1:18040</value>
>>>>> >> >>     <description>the host is the hostname of the ResourceManager
>>>>> and
>>>>> >> >> the
>>>>> >> >> port is the port on
>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>> >> >>     <description>the local directories used by the
>>>>> >> >> nodemanager</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>> >> >>     <description>the nodemanagers bind to this port</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>10240</value>
>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>> >> >> GB</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>> are moved
>>>>> >> >> to
>>>>> >> >> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>    <property>
>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>> >> >> directories</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to
>>>>> >> >> run </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>> >> >>     <value>24</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>> >> >>     <value>3</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>22000</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>> >> >>     <value>2.1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>> >> >>>
>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>> resource
>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum
>>>>> by
>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB
>>>>> NM
>>>>> >> >>> memory resource config) total containers. Do share your configs
>>>>> as at
>>>>> >> >>> this point none of us can tell what it is.
>>>>> >> >>>
>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>> to not
>>>>> >> >>> care about such things :)
>>>>> >> >>>
>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <samliuhadoop@gmail.com
>>>>> >
>>>>> >> >>> wrote:
>>>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>>>> and
>>>>> >> >>> > Yarn.
>>>>> >> >>> > But
>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>> did not
>>>>> >> >>> > tune
>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>> After
>>>>> >> >>> > analyzing
>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>> comparison
>>>>> >> >>> > again.
>>>>> >> >>> >
>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>> compare
>>>>> >> >>> > with
>>>>> >> >>> > MRv1,
>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>>>> >> >>> > phase(terasort test).
>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>> performance
>>>>> >> >>> > tunning?
>>>>> >> >>> >
>>>>> >> >>> > Thanks!
>>>>> >> >>> >
>>>>> >> >>> >
>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>> marcosluis2186@gmail.com>
>>>>> >> >>> >>
>>>>> >> >>> >> Why not to tune the configurations?
>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>> >> >>> >>>
>>>>> >> >>> >>> Hi Experts,
>>>>> >> >>> >>>
>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>>>> >> >>> >>> future,
>>>>> >> >>> >>> and
>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>> >> >>> >>>
>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>> hardware:
>>>>> >> >>> >>> 2
>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>>> the
>>>>> >> >>> >>> same
>>>>> >> >>> >>> env. To
>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>> >> >>> >>> configurations, but
>>>>> >> >>> >>> use the default configuration values.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1,
>>>>> if
>>>>> >> >>> >>> they
>>>>> >> >>> >>> all
>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>> framework than
>>>>> >> >>> >>> MRv1.
>>>>> >> >>> >>> However, the test result shows some differences:
>>>>> >> >>> >>>
>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>> >> >>> >>>
>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>> >> >>> >>> - MRv1: 193 sec
>>>>> >> >>> >>> - Yarn: 69 sec
>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>> >> >>> >>> - MRv1: 451 sec
>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>> that Yarn
>>>>> >> >>> >>> is
>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>>>> >> >>> >>> phase.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Here I have two questions:
>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>> >> >>> >>> materials?
>>>>> >> >>> >>>
>>>>> >> >>> >>> Thanks!
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> --
>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>> >> >>> >> Product Manager at PDVSA
>>>>> >> >>> >> http://about.me/marcosortiz
>>>>> >> >>> >>
>>>>> >> >>> >
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> Harsh J
>>>>> >> >>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Harsh J
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Thanks Sandy. I will try to turn JVM resue off and see what happens.

Yes, I saw quite some exceptions in the task attempts. For instance.


2013-10-20 03:13:58,751 ERROR [main]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
Failed to close file
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
File does not exist. Holder
DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
any open files.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
--
        at com.sun.proxy.$Proxy10.complete(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
        at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
        at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
        at
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
        at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
        at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
        at
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
        at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.nio.channels.ClosedChannelException
        at
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
        at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
        at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at
org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)



On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> It looks like many of your reduce tasks were killed.  Do you know why?
>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
> MR1 with JVM reuse turned off.
>
> -Sandy
>
>
> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> The Terasort output for MR2 is as follows.
>>
>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=456102049355
>>                 FILE: Number of bytes written=897246250517
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000851200
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=32131
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=224
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=20
>>                 Launched map tasks=7601
>>                 Launched reduce tasks=132
>>                 Data-local map tasks=7591
>>                 Rack-local map tasks=10
>>                 Total time spent by all maps in occupied slots
>> (ms)=1696141311
>>                 Total time spent by all reduces in occupied slots
>> (ms)=2664045096
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440486356802
>>                 Input split bytes=851200
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440486356802
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =851200
>>                 Failed Shuffles=61
>>                 Merged Map outputs=851200
>>                 GC time elapsed (ms)=4215666
>>                 CPU time spent (ms)=192433000
>>                 Physical memory (bytes) snapshot=3349356380160
>>                 Virtual memory (bytes) snapshot=9665208745984
>>                 Total committed heap usage (bytes)=3636854259712
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=4
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>> Thanks,
>>
>> John
>>
>>
>>
>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>> I cannot really believe it.
>>>
>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>> configurations are as follows.
>>>
>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>         mapreduce.map.memory.mb = "768";
>>>         mapreduce.reduce.memory.mb = "2048";
>>>
>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>
>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>         mapreduce.task.io.sort.factor = "48";
>>>         mapreduce.task.io.sort.mb = "200";
>>>         mapreduce.map.speculative = "true";
>>>         mapreduce.reduce.speculative = "true";
>>>         mapreduce.framework.name = "yarn";
>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>         mapreduce.map.cpu.vcores = "1";
>>>         mapreduce.reduce.cpu.vcores = "2";
>>>
>>>         mapreduce.job.jvm.numtasks = "20";
>>>         mapreduce.map.output.compress = "true";
>>>         mapreduce.map.output.compress.codec =
>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>
>>>         yarn.resourcemanager.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.class =
>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>         yarn.nodemanager.container-executor.class =
>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>
>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>> it took around 90 minutes in MR2.
>>>
>>> Does anyone know what is wrong here? Or do I need some special
>>> configurations for terasort to work better in MR2?
>>>
>>> Thanks in advance,
>>>
>>> John
>>>
>>>
>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> Sam,
>>>> I think your cluster is too small for any meaningful conclusions to be
>>>> made.
>>>>
>>>>
>>>> Sent from a remote device. Please excuse any typos...
>>>>
>>>> Mike Segel
>>>>
>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>
>>>> Hi Harsh,
>>>>
>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>> cluster improved a lot after increasing the reducer
>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>> questions about the way of Yarn to execute MRv1 job:
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>> of physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?
>>>>
>>>> Thanks as always!
>>>>
>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>
>>>>> Hi Sam,
>>>>>
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due to a 22 GB memory?
>>>>>
>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>> runs the job, additionally. This is tunable using the properties
>>>>> Sandy's mentioned in his post.
>>>>>
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to containers?
>>>>>
>>>>> This is a general question. You may use the same process you took to
>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>> lower NM's published memory resources to control the same (config
>>>>> change).
>>>>>
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>
>>>>> Yes, all of these properties will still work. Old properties specific
>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>> name) will not apply anymore.
>>>>>
>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>> wrote:
>>>>> > Hi Harsh,
>>>>> >
>>>>> > According to above suggestions, I removed the duplication of
>>>>> setting, and
>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>> the
>>>>> > efficiency improved about 18%.  I have questions:
>>>>> >
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due
>>>>> > to a 22 GB memory?
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to
>>>>> > containers?
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will
>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>> Like
>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>> >
>>>>> > Thanks!
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>> >>
>>>>> >> Hey Sam,
>>>>> >>
>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>> for
>>>>> >> both test runs.
>>>>> >>
>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>> sandy.ryza@cloudera.com>
>>>>> >> wrote:
>>>>> >> > Hey Sam,
>>>>> >> >
>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>> what's
>>>>> >> > causing the difference.
>>>>> >> >
>>>>> >> > A couple observations:
>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>> there
>>>>> >> > twice
>>>>> >> > with two different values.
>>>>> >> >
>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>> default
>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>>> getting
>>>>> >> > killed for running over resource limits?
>>>>> >> >
>>>>> >> > -Sandy
>>>>> >> >
>>>>> >> >
>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>> mins from
>>>>> >> >> 33%
>>>>> >> >> to 35% as below.
>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>> >> >>
>>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>>> >> >> (A) core-site.xml
>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>> >> >>
>>>>> >> >> (B) hdfs-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.replication</name>
>>>>> >> >>     <value>1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.name.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.data.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.block.size</name>
>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>> >> >>     <value>10</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (C) mapred-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>> >> >>   <value>-Xmx1000m</value>
>>>>> >> >> </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>> >> >>     <value>yarn</value>
>>>>> >> >>    </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>> >> >>     <value>8</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>> >> >>     <value>4</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>> >> >>     <value>true</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (D) yarn-site.xml
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>> >> >>     <value>node1:18025</value>
>>>>> >> >>     <description>host is the hostname of the resource manager and
>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>> Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <description>The address of the RM web
>>>>> application.</description>
>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>> >> >>     <value>node1:18088</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>> >> >>     <value>node1:18030</value>
>>>>> >> >>     <description>host is the hostname of the resourcemanager and
>>>>> port
>>>>> >> >> is
>>>>> >> >> the port
>>>>> >> >>     on which the Applications in the cluster talk to the Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>> >> >>     <value>node1:18040</value>
>>>>> >> >>     <description>the host is the hostname of the ResourceManager
>>>>> and
>>>>> >> >> the
>>>>> >> >> port is the port on
>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>> >> >>     <description>the local directories used by the
>>>>> >> >> nodemanager</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>> >> >>     <description>the nodemanagers bind to this port</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>10240</value>
>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>> >> >> GB</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>> are moved
>>>>> >> >> to
>>>>> >> >> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>    <property>
>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>> >> >> directories</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to
>>>>> >> >> run </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>> >> >>     <value>24</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>> >> >>     <value>3</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>22000</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>> >> >>     <value>2.1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>> >> >>>
>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>> resource
>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum
>>>>> by
>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB
>>>>> NM
>>>>> >> >>> memory resource config) total containers. Do share your configs
>>>>> as at
>>>>> >> >>> this point none of us can tell what it is.
>>>>> >> >>>
>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>> to not
>>>>> >> >>> care about such things :)
>>>>> >> >>>
>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <samliuhadoop@gmail.com
>>>>> >
>>>>> >> >>> wrote:
>>>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>>>> and
>>>>> >> >>> > Yarn.
>>>>> >> >>> > But
>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>> did not
>>>>> >> >>> > tune
>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>> After
>>>>> >> >>> > analyzing
>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>> comparison
>>>>> >> >>> > again.
>>>>> >> >>> >
>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>> compare
>>>>> >> >>> > with
>>>>> >> >>> > MRv1,
>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>>>> >> >>> > phase(terasort test).
>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>> performance
>>>>> >> >>> > tunning?
>>>>> >> >>> >
>>>>> >> >>> > Thanks!
>>>>> >> >>> >
>>>>> >> >>> >
>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>> marcosluis2186@gmail.com>
>>>>> >> >>> >>
>>>>> >> >>> >> Why not to tune the configurations?
>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>> >> >>> >>>
>>>>> >> >>> >>> Hi Experts,
>>>>> >> >>> >>>
>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>>>> >> >>> >>> future,
>>>>> >> >>> >>> and
>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>> >> >>> >>>
>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>> hardware:
>>>>> >> >>> >>> 2
>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>>> the
>>>>> >> >>> >>> same
>>>>> >> >>> >>> env. To
>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>> >> >>> >>> configurations, but
>>>>> >> >>> >>> use the default configuration values.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1,
>>>>> if
>>>>> >> >>> >>> they
>>>>> >> >>> >>> all
>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>> framework than
>>>>> >> >>> >>> MRv1.
>>>>> >> >>> >>> However, the test result shows some differences:
>>>>> >> >>> >>>
>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>> >> >>> >>>
>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>> >> >>> >>> - MRv1: 193 sec
>>>>> >> >>> >>> - Yarn: 69 sec
>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>> >> >>> >>> - MRv1: 451 sec
>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>> that Yarn
>>>>> >> >>> >>> is
>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>>>> >> >>> >>> phase.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Here I have two questions:
>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>> >> >>> >>> materials?
>>>>> >> >>> >>>
>>>>> >> >>> >>> Thanks!
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> --
>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>> >> >>> >> Product Manager at PDVSA
>>>>> >> >>> >> http://about.me/marcosortiz
>>>>> >> >>> >>
>>>>> >> >>> >
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> Harsh J
>>>>> >> >>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Harsh J
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Thanks Sandy. I will try to turn JVM resue off and see what happens.

Yes, I saw quite some exceptions in the task attempts. For instance.


2013-10-20 03:13:58,751 ERROR [main]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
Failed to close file
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
File does not exist. Holder
DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
any open files.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
--
        at com.sun.proxy.$Proxy10.complete(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
        at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
        at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
        at
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
        at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
        at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
        at
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
        at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.nio.channels.ClosedChannelException
        at
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
        at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
        at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at
org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)



On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> It looks like many of your reduce tasks were killed.  Do you know why?
>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
> MR1 with JVM reuse turned off.
>
> -Sandy
>
>
> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> The Terasort output for MR2 is as follows.
>>
>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=456102049355
>>                 FILE: Number of bytes written=897246250517
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000851200
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=32131
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=224
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=20
>>                 Launched map tasks=7601
>>                 Launched reduce tasks=132
>>                 Data-local map tasks=7591
>>                 Rack-local map tasks=10
>>                 Total time spent by all maps in occupied slots
>> (ms)=1696141311
>>                 Total time spent by all reduces in occupied slots
>> (ms)=2664045096
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440486356802
>>                 Input split bytes=851200
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440486356802
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =851200
>>                 Failed Shuffles=61
>>                 Merged Map outputs=851200
>>                 GC time elapsed (ms)=4215666
>>                 CPU time spent (ms)=192433000
>>                 Physical memory (bytes) snapshot=3349356380160
>>                 Virtual memory (bytes) snapshot=9665208745984
>>                 Total committed heap usage (bytes)=3636854259712
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=4
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>> Thanks,
>>
>> John
>>
>>
>>
>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>> I cannot really believe it.
>>>
>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>> configurations are as follows.
>>>
>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>         mapreduce.map.memory.mb = "768";
>>>         mapreduce.reduce.memory.mb = "2048";
>>>
>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>
>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>         mapreduce.task.io.sort.factor = "48";
>>>         mapreduce.task.io.sort.mb = "200";
>>>         mapreduce.map.speculative = "true";
>>>         mapreduce.reduce.speculative = "true";
>>>         mapreduce.framework.name = "yarn";
>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>         mapreduce.map.cpu.vcores = "1";
>>>         mapreduce.reduce.cpu.vcores = "2";
>>>
>>>         mapreduce.job.jvm.numtasks = "20";
>>>         mapreduce.map.output.compress = "true";
>>>         mapreduce.map.output.compress.codec =
>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>
>>>         yarn.resourcemanager.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.class =
>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>         yarn.nodemanager.container-executor.class =
>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>
>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>> it took around 90 minutes in MR2.
>>>
>>> Does anyone know what is wrong here? Or do I need some special
>>> configurations for terasort to work better in MR2?
>>>
>>> Thanks in advance,
>>>
>>> John
>>>
>>>
>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> Sam,
>>>> I think your cluster is too small for any meaningful conclusions to be
>>>> made.
>>>>
>>>>
>>>> Sent from a remote device. Please excuse any typos...
>>>>
>>>> Mike Segel
>>>>
>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>
>>>> Hi Harsh,
>>>>
>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>> cluster improved a lot after increasing the reducer
>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>> questions about the way of Yarn to execute MRv1 job:
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>> of physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?
>>>>
>>>> Thanks as always!
>>>>
>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>
>>>>> Hi Sam,
>>>>>
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due to a 22 GB memory?
>>>>>
>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>> runs the job, additionally. This is tunable using the properties
>>>>> Sandy's mentioned in his post.
>>>>>
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to containers?
>>>>>
>>>>> This is a general question. You may use the same process you took to
>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>> lower NM's published memory resources to control the same (config
>>>>> change).
>>>>>
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>
>>>>> Yes, all of these properties will still work. Old properties specific
>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>> name) will not apply anymore.
>>>>>
>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>> wrote:
>>>>> > Hi Harsh,
>>>>> >
>>>>> > According to above suggestions, I removed the duplication of
>>>>> setting, and
>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>> the
>>>>> > efficiency improved about 18%.  I have questions:
>>>>> >
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due
>>>>> > to a 22 GB memory?
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to
>>>>> > containers?
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will
>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>> Like
>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>> >
>>>>> > Thanks!
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>> >>
>>>>> >> Hey Sam,
>>>>> >>
>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>> for
>>>>> >> both test runs.
>>>>> >>
>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>> sandy.ryza@cloudera.com>
>>>>> >> wrote:
>>>>> >> > Hey Sam,
>>>>> >> >
>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>> what's
>>>>> >> > causing the difference.
>>>>> >> >
>>>>> >> > A couple observations:
>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>> there
>>>>> >> > twice
>>>>> >> > with two different values.
>>>>> >> >
>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>> default
>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>>> getting
>>>>> >> > killed for running over resource limits?
>>>>> >> >
>>>>> >> > -Sandy
>>>>> >> >
>>>>> >> >
>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>> mins from
>>>>> >> >> 33%
>>>>> >> >> to 35% as below.
>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>> >> >>
>>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>>> >> >> (A) core-site.xml
>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>> >> >>
>>>>> >> >> (B) hdfs-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.replication</name>
>>>>> >> >>     <value>1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.name.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.data.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.block.size</name>
>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>> >> >>     <value>10</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (C) mapred-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>> >> >>   <value>-Xmx1000m</value>
>>>>> >> >> </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>> >> >>     <value>yarn</value>
>>>>> >> >>    </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>> >> >>     <value>8</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>> >> >>     <value>4</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>> >> >>     <value>true</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (D) yarn-site.xml
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>> >> >>     <value>node1:18025</value>
>>>>> >> >>     <description>host is the hostname of the resource manager and
>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>> Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <description>The address of the RM web
>>>>> application.</description>
>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>> >> >>     <value>node1:18088</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>> >> >>     <value>node1:18030</value>
>>>>> >> >>     <description>host is the hostname of the resourcemanager and
>>>>> port
>>>>> >> >> is
>>>>> >> >> the port
>>>>> >> >>     on which the Applications in the cluster talk to the Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>> >> >>     <value>node1:18040</value>
>>>>> >> >>     <description>the host is the hostname of the ResourceManager
>>>>> and
>>>>> >> >> the
>>>>> >> >> port is the port on
>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>> >> >>     <description>the local directories used by the
>>>>> >> >> nodemanager</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>> >> >>     <description>the nodemanagers bind to this port</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>10240</value>
>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>> >> >> GB</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>> are moved
>>>>> >> >> to
>>>>> >> >> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>    <property>
>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>> >> >> directories</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to
>>>>> >> >> run </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>> >> >>     <value>24</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>> >> >>     <value>3</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>22000</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>> >> >>     <value>2.1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>> >> >>>
>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>> resource
>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum
>>>>> by
>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB
>>>>> NM
>>>>> >> >>> memory resource config) total containers. Do share your configs
>>>>> as at
>>>>> >> >>> this point none of us can tell what it is.
>>>>> >> >>>
>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>> to not
>>>>> >> >>> care about such things :)
>>>>> >> >>>
>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <samliuhadoop@gmail.com
>>>>> >
>>>>> >> >>> wrote:
>>>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>>>> and
>>>>> >> >>> > Yarn.
>>>>> >> >>> > But
>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>> did not
>>>>> >> >>> > tune
>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>> After
>>>>> >> >>> > analyzing
>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>> comparison
>>>>> >> >>> > again.
>>>>> >> >>> >
>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>> compare
>>>>> >> >>> > with
>>>>> >> >>> > MRv1,
>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>>>> >> >>> > phase(terasort test).
>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>> performance
>>>>> >> >>> > tunning?
>>>>> >> >>> >
>>>>> >> >>> > Thanks!
>>>>> >> >>> >
>>>>> >> >>> >
>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>> marcosluis2186@gmail.com>
>>>>> >> >>> >>
>>>>> >> >>> >> Why not to tune the configurations?
>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>> >> >>> >>>
>>>>> >> >>> >>> Hi Experts,
>>>>> >> >>> >>>
>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>>>> >> >>> >>> future,
>>>>> >> >>> >>> and
>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>> >> >>> >>>
>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>> hardware:
>>>>> >> >>> >>> 2
>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>>> the
>>>>> >> >>> >>> same
>>>>> >> >>> >>> env. To
>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>> >> >>> >>> configurations, but
>>>>> >> >>> >>> use the default configuration values.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1,
>>>>> if
>>>>> >> >>> >>> they
>>>>> >> >>> >>> all
>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>> framework than
>>>>> >> >>> >>> MRv1.
>>>>> >> >>> >>> However, the test result shows some differences:
>>>>> >> >>> >>>
>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>> >> >>> >>>
>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>> >> >>> >>> - MRv1: 193 sec
>>>>> >> >>> >>> - Yarn: 69 sec
>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>> >> >>> >>> - MRv1: 451 sec
>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>> that Yarn
>>>>> >> >>> >>> is
>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>>>> >> >>> >>> phase.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Here I have two questions:
>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>> >> >>> >>> materials?
>>>>> >> >>> >>>
>>>>> >> >>> >>> Thanks!
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> --
>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>> >> >>> >> Product Manager at PDVSA
>>>>> >> >>> >> http://about.me/marcosortiz
>>>>> >> >>> >>
>>>>> >> >>> >
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> Harsh J
>>>>> >> >>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Harsh J
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

Thanks Sandy. I will try to turn JVM resue off and see what happens.

Yes, I saw quite some exceptions in the task attempts. For instance.


2013-10-20 03:13:58,751 ERROR [main]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
Failed to close file
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
File does not exist. Holder
DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
any open files.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
--
        at com.sun.proxy.$Proxy10.complete(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
        at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
        at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
        at
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
        at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
        at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
        at
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
        at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.nio.channels.ClosedChannelException
        at
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
        at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
        at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at
org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)



On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> It looks like many of your reduce tasks were killed.  Do you know why?
>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
> MR1 with JVM reuse turned off.
>
> -Sandy
>
>
> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> The Terasort output for MR2 is as follows.
>>
>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=456102049355
>>                 FILE: Number of bytes written=897246250517
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000851200
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=32131
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=224
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=20
>>                 Launched map tasks=7601
>>                 Launched reduce tasks=132
>>                 Data-local map tasks=7591
>>                 Rack-local map tasks=10
>>                 Total time spent by all maps in occupied slots
>> (ms)=1696141311
>>                 Total time spent by all reduces in occupied slots
>> (ms)=2664045096
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440486356802
>>                 Input split bytes=851200
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440486356802
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =851200
>>                 Failed Shuffles=61
>>                 Merged Map outputs=851200
>>                 GC time elapsed (ms)=4215666
>>                 CPU time spent (ms)=192433000
>>                 Physical memory (bytes) snapshot=3349356380160
>>                 Virtual memory (bytes) snapshot=9665208745984
>>                 Total committed heap usage (bytes)=3636854259712
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=4
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>> Thanks,
>>
>> John
>>
>>
>>
>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>> I cannot really believe it.
>>>
>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>> configurations are as follows.
>>>
>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>         mapreduce.map.memory.mb = "768";
>>>         mapreduce.reduce.memory.mb = "2048";
>>>
>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>
>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>         mapreduce.task.io.sort.factor = "48";
>>>         mapreduce.task.io.sort.mb = "200";
>>>         mapreduce.map.speculative = "true";
>>>         mapreduce.reduce.speculative = "true";
>>>         mapreduce.framework.name = "yarn";
>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>         mapreduce.map.cpu.vcores = "1";
>>>         mapreduce.reduce.cpu.vcores = "2";
>>>
>>>         mapreduce.job.jvm.numtasks = "20";
>>>         mapreduce.map.output.compress = "true";
>>>         mapreduce.map.output.compress.codec =
>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>
>>>         yarn.resourcemanager.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.class =
>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>         yarn.nodemanager.container-executor.class =
>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>
>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>> it took around 90 minutes in MR2.
>>>
>>> Does anyone know what is wrong here? Or do I need some special
>>> configurations for terasort to work better in MR2?
>>>
>>> Thanks in advance,
>>>
>>> John
>>>
>>>
>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> Sam,
>>>> I think your cluster is too small for any meaningful conclusions to be
>>>> made.
>>>>
>>>>
>>>> Sent from a remote device. Please excuse any typos...
>>>>
>>>> Mike Segel
>>>>
>>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>>
>>>> Hi Harsh,
>>>>
>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>> cluster improved a lot after increasing the reducer
>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>> questions about the way of Yarn to execute MRv1 job:
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>> of physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?
>>>>
>>>> Thanks as always!
>>>>
>>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>>
>>>>> Hi Sam,
>>>>>
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due to a 22 GB memory?
>>>>>
>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>> runs the job, additionally. This is tunable using the properties
>>>>> Sandy's mentioned in his post.
>>>>>
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to containers?
>>>>>
>>>>> This is a general question. You may use the same process you took to
>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>> lower NM's published memory resources to control the same (config
>>>>> change).
>>>>>
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>
>>>>> Yes, all of these properties will still work. Old properties specific
>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>> name) will not apply anymore.
>>>>>
>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com>
>>>>> wrote:
>>>>> > Hi Harsh,
>>>>> >
>>>>> > According to above suggestions, I removed the duplication of
>>>>> setting, and
>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>> the
>>>>> > efficiency improved about 18%.  I have questions:
>>>>> >
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due
>>>>> > to a 22 GB memory?
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to
>>>>> > containers?
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will
>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>> Like
>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>> >
>>>>> > Thanks!
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>>> >>
>>>>> >> Hey Sam,
>>>>> >>
>>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>> >> resource. This could impact much, given the CPU is still the same
>>>>> for
>>>>> >> both test runs.
>>>>> >>
>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>> sandy.ryza@cloudera.com>
>>>>> >> wrote:
>>>>> >> > Hey Sam,
>>>>> >> >
>>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>>> what's
>>>>> >> > causing the difference.
>>>>> >> >
>>>>> >> > A couple observations:
>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>>> there
>>>>> >> > twice
>>>>> >> > with two different values.
>>>>> >> >
>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>>> default
>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>>> getting
>>>>> >> > killed for running over resource limits?
>>>>> >> >
>>>>> >> > -Sandy
>>>>> >> >
>>>>> >> >
>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> The terasort execution log shows that reduce spent about 5.5
>>>>> mins from
>>>>> >> >> 33%
>>>>> >> >> to 35% as below.
>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>> >> >>
>>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>>> >> >> (A) core-site.xml
>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>> >> >>
>>>>> >> >> (B) hdfs-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.replication</name>
>>>>> >> >>     <value>1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.name.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.data.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.block.size</name>
>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>> >> >>     <value>10</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (C) mapred-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>> >> >>   <value>-Xmx1000m</value>
>>>>> >> >> </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>> >> >>     <value>yarn</value>
>>>>> >> >>    </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>> >> >>     <value>8</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>> >> >>     <value>4</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>> >> >>     <value>true</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (D) yarn-site.xml
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>> >> >>     <value>node1:18025</value>
>>>>> >> >>     <description>host is the hostname of the resource manager and
>>>>> >> >>     port is the port on which the NodeManagers contact the
>>>>> Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <description>The address of the RM web
>>>>> application.</description>
>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>> >> >>     <value>node1:18088</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>> >> >>     <value>node1:18030</value>
>>>>> >> >>     <description>host is the hostname of the resourcemanager and
>>>>> port
>>>>> >> >> is
>>>>> >> >> the port
>>>>> >> >>     on which the Applications in the cluster talk to the Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>> >> >>     <value>node1:18040</value>
>>>>> >> >>     <description>the host is the hostname of the ResourceManager
>>>>> and
>>>>> >> >> the
>>>>> >> >> port is the port on
>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>> >> >>     <description>the local directories used by the
>>>>> >> >> nodemanager</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>> >> >>     <description>the nodemanagers bind to this port</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>10240</value>
>>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>>> >> >> GB</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>> >> >>     <description>directory on hdfs where the application logs
>>>>> are moved
>>>>> >> >> to
>>>>> >> >> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>    <property>
>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>>> >> >> directories</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to
>>>>> >> >> run </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>> >> >>     <value>24</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>> >> >>     <value>3</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>22000</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>> >> >>     <value>2.1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>>> >> >>>
>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>>> resource
>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum
>>>>> by
>>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB
>>>>> NM
>>>>> >> >>> memory resource config) total containers. Do share your configs
>>>>> as at
>>>>> >> >>> this point none of us can tell what it is.
>>>>> >> >>>
>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and
>>>>> to not
>>>>> >> >>> care about such things :)
>>>>> >> >>>
>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <samliuhadoop@gmail.com
>>>>> >
>>>>> >> >>> wrote:
>>>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>>>> and
>>>>> >> >>> > Yarn.
>>>>> >> >>> > But
>>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>>> did not
>>>>> >> >>> > tune
>>>>> >> >>> > their configurations at all.  So I got above test results.
>>>>> After
>>>>> >> >>> > analyzing
>>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>>> comparison
>>>>> >> >>> > again.
>>>>> >> >>> >
>>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>>> compare
>>>>> >> >>> > with
>>>>> >> >>> > MRv1,
>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>>>> >> >>> > phase(terasort test).
>>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>>> performance
>>>>> >> >>> > tunning?
>>>>> >> >>> >
>>>>> >> >>> > Thanks!
>>>>> >> >>> >
>>>>> >> >>> >
>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>> marcosluis2186@gmail.com>
>>>>> >> >>> >>
>>>>> >> >>> >> Why not to tune the configurations?
>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>>> >> >>> >>>
>>>>> >> >>> >>> Hi Experts,
>>>>> >> >>> >>>
>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>>>> >> >>> >>> future,
>>>>> >> >>> >>> and
>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>>> >> >>> >>>
>>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>>> hardware:
>>>>> >> >>> >>> 2
>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>>> the
>>>>> >> >>> >>> same
>>>>> >> >>> >>> env. To
>>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>>> >> >>> >>> configurations, but
>>>>> >> >>> >>> use the default configuration values.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1,
>>>>> if
>>>>> >> >>> >>> they
>>>>> >> >>> >>> all
>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>> framework than
>>>>> >> >>> >>> MRv1.
>>>>> >> >>> >>> However, the test result shows some differences:
>>>>> >> >>> >>>
>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>> >> >>> >>>
>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>> >> >>> >>> - MRv1: 193 sec
>>>>> >> >>> >>> - Yarn: 69 sec
>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>> >> >>> >>> - MRv1: 451 sec
>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>>> that Yarn
>>>>> >> >>> >>> is
>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>>>> >> >>> >>> phase.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Here I have two questions:
>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>>> >> >>> >>> materials?
>>>>> >> >>> >>>
>>>>> >> >>> >>> Thanks!
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> --
>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>> >> >>> >> Product Manager at PDVSA
>>>>> >> >>> >> http://about.me/marcosortiz
>>>>> >> >>> >>
>>>>> >> >>> >
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> Harsh J
>>>>> >> >>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Harsh J
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

It looks like many of your reduce tasks were killed.  Do you know why?
 Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
MR1 with JVM reuse turned off.

-Sandy


On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <ji...@gmail.com>wrote:

> The Terasort output for MR2 is as follows.
>
> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=456102049355
>                 FILE: Number of bytes written=897246250517
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000851200
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=32131
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=224
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=20
>                 Launched map tasks=7601
>                 Launched reduce tasks=132
>                 Data-local map tasks=7591
>                 Rack-local map tasks=10
>                 Total time spent by all maps in occupied slots
> (ms)=1696141311
>                 Total time spent by all reduces in occupied slots
> (ms)=2664045096
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440486356802
>                 Input split bytes=851200
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440486356802
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =851200
>                 Failed Shuffles=61
>                 Merged Map outputs=851200
>                 GC time elapsed (ms)=4215666
>                 CPU time spent (ms)=192433000
>                 Physical memory (bytes) snapshot=3349356380160
>                 Virtual memory (bytes) snapshot=9665208745984
>                 Total committed heap usage (bytes)=3636854259712
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=4
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
> Thanks,
>
> John
>
>
>
> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Hi,
>>
>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
>> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
>> cannot really believe it.
>>
>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>> configurations are as follows.
>>
>>         mapreduce.map.java.opts = "-Xmx512m";
>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>         mapreduce.map.memory.mb = "768";
>>         mapreduce.reduce.memory.mb = "2048";
>>
>>         yarn.scheduler.minimum-allocation-mb = "256";
>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>         yarn.nodemanager.resource.memory-mb = "12288";
>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>
>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>         mapreduce.task.io.sort.factor = "48";
>>         mapreduce.task.io.sort.mb = "200";
>>         mapreduce.map.speculative = "true";
>>         mapreduce.reduce.speculative = "true";
>>         mapreduce.framework.name = "yarn";
>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>         mapreduce.map.cpu.vcores = "1";
>>         mapreduce.reduce.cpu.vcores = "2";
>>
>>         mapreduce.job.jvm.numtasks = "20";
>>         mapreduce.map.output.compress = "true";
>>         mapreduce.map.output.compress.codec =
>> "org.apache.hadoop.io.compress.SnappyCodec";
>>
>>         yarn.resourcemanager.client.thread-count = "64";
>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>         yarn.resourcemanager.scheduler.class =
>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>> "org.apache.hadoop.mapred.ShuffleHandler";
>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>         yarn.nodemanager.container-executor.class =
>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>         yarn.nodemanager.container-manager.thread-count = "64";
>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>
>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
>> configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
>> around 90 minutes in MR2.
>>
>> Does anyone know what is wrong here? Or do I need some special
>> configurations for terasort to work better in MR2?
>>
>> Thanks in advance,
>>
>> John
>>
>>
>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:
>>
>>> Sam,
>>> I think your cluster is too small for any meaningful conclusions to be
>>> made.
>>>
>>>
>>> Sent from a remote device. Please excuse any typos...
>>>
>>> Mike Segel
>>>
>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Hi Harsh,
>>>
>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>> cluster improved a lot after increasing the reducer
>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>> questions about the way of Yarn to execute MRv1 job:
>>>
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>>> like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>> job?
>>>
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>> of physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?
>>>
>>> Thanks as always!
>>>
>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>
>>>> Hi Sam,
>>>>
>>>> > - How to know the container number? Why you say it will be 22
>>>> containers due to a 22 GB memory?
>>>>
>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>> runs the job, additionally. This is tunable using the properties
>>>> Sandy's mentioned in his post.
>>>>
>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>> assigned to containers?
>>>>
>>>> This is a general question. You may use the same process you took to
>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>> (if not the memory). Either increase memory requests from jobs, to
>>>> lower # of concurrent containers at a given time (runtime change), or
>>>> lower NM's published memory resources to control the same (config
>>>> change).
>>>>
>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>> 'mapreduce.map.sort.spill.percent'
>>>>
>>>> Yes, all of these properties will still work. Old properties specific
>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>> name) will not apply anymore.
>>>>
>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>>>> > Hi Harsh,
>>>> >
>>>> > According to above suggestions, I removed the duplication of setting,
>>>> and
>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>> the
>>>> > efficiency improved about 18%.  I have questions:
>>>> >
>>>> > - How to know the container number? Why you say it will be 22
>>>> containers due
>>>> > to a 22 GB memory?
>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>> assigned to
>>>> > containers?
>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>> 'yarn', will
>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>> Like
>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> >
>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>> >>
>>>> >> Hey Sam,
>>>> >>
>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>> >> resource. This could impact much, given the CPU is still the same for
>>>> >> both test runs.
>>>> >>
>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>> >
>>>> >> wrote:
>>>> >> > Hey Sam,
>>>> >> >
>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>> what's
>>>> >> > causing the difference.
>>>> >> >
>>>> >> > A couple observations:
>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>> there
>>>> >> > twice
>>>> >> > with two different values.
>>>> >> >
>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>> default
>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>> getting
>>>> >> > killed for running over resource limits?
>>>> >> >
>>>> >> > -Sandy
>>>> >> >
>>>> >> >
>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>>>> from
>>>> >> >> 33%
>>>> >> >> to 35% as below.
>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>> >> >>
>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>> >> >> (A) core-site.xml
>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>> >> >>
>>>> >> >> (B) hdfs-site.xml
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.replication</name>
>>>> >> >>     <value>1</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.name.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.data.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.block.size</name>
>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>> >> >>     <value>64</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>> >> >>     <value>10</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> (C) mapred-site.xml
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>> >> >>     <description>No description</description>
>>>> >> >>     <final>true</final>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>> >> >>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>> >> >>     <description>No description</description>
>>>> >> >>     <final>true</final>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>> >> >>   <value>-Xmx1000m</value>
>>>> >> >> </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>     <name>mapreduce.framework.name</name>
>>>> >> >>     <value>yarn</value>
>>>> >> >>    </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>> >> >>     <value>8</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>> >> >>     <value>4</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>> >> >>     <value>true</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> (D) yarn-site.xml
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>> >> >>     <value>node1:18025</value>
>>>> >> >>     <description>host is the hostname of the resource manager and
>>>> >> >>     port is the port on which the NodeManagers contact the
>>>> Resource
>>>> >> >> Manager.
>>>> >> >>     </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <description>The address of the RM web
>>>> application.</description>
>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>> >> >>     <value>node1:18088</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>> >> >>     <value>node1:18030</value>
>>>> >> >>     <description>host is the hostname of the resourcemanager and
>>>> port
>>>> >> >> is
>>>> >> >> the port
>>>> >> >>     on which the Applications in the cluster talk to the Resource
>>>> >> >> Manager.
>>>> >> >>     </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>> >> >>     <value>node1:18040</value>
>>>> >> >>     <description>the host is the hostname of the ResourceManager
>>>> and
>>>> >> >> the
>>>> >> >> port is the port on
>>>> >> >>     which the clients can talk to the Resource Manager.
>>>> </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>> >> >>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>> >> >>     <description>the local directories used by the
>>>> >> >> nodemanager</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>> >> >>     <value>0.0.0.0:18050</value>
>>>> >> >>     <description>the nodemanagers bind to this port</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>> >> >>     <value>10240</value>
>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>> >> >> GB</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>> >> >>     <description>directory on hdfs where the application logs are
>>>> moved
>>>> >> >> to
>>>> >> >> </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>    <property>
>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>> >> >> directories</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>> >> >>     <value>mapreduce.shuffle</value>
>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>> Reduce to
>>>> >> >> run </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>> >> >>     <value>64</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>> >> >>     <value>24</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>> >> >>     <value>3</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>> >> >>     <value>22000</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>> >> >>     <value>2.1</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>> >> >>>
>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>> resource
>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum
>>>> by
>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB
>>>> NM
>>>> >> >>> memory resource config) total containers. Do share your configs
>>>> as at
>>>> >> >>> this point none of us can tell what it is.
>>>> >> >>>
>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>>>> not
>>>> >> >>> care about such things :)
>>>> >> >>>
>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>>>> >> >>> wrote:
>>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>>> and
>>>> >> >>> > Yarn.
>>>> >> >>> > But
>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>> did not
>>>> >> >>> > tune
>>>> >> >>> > their configurations at all.  So I got above test results.
>>>> After
>>>> >> >>> > analyzing
>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>> comparison
>>>> >> >>> > again.
>>>> >> >>> >
>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>> compare
>>>> >> >>> > with
>>>> >> >>> > MRv1,
>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>>> >> >>> > phase(terasort test).
>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>> performance
>>>> >> >>> > tunning?
>>>> >> >>> >
>>>> >> >>> > Thanks!
>>>> >> >>> >
>>>> >> >>> >
>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2186@gmail.com
>>>> >
>>>> >> >>> >>
>>>> >> >>> >> Why not to tune the configurations?
>>>> >> >>> >> Both frameworks have many areas to tune:
>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>> >> >>> >>>
>>>> >> >>> >>> Hi Experts,
>>>> >> >>> >>>
>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>>> >> >>> >>> future,
>>>> >> >>> >>> and
>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>> >> >>> >>>
>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>> hardware:
>>>> >> >>> >>> 2
>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>> the
>>>> >> >>> >>> same
>>>> >> >>> >>> env. To
>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>> >> >>> >>> configurations, but
>>>> >> >>> >>> use the default configuration values.
>>>> >> >>> >>>
>>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1,
>>>> if
>>>> >> >>> >>> they
>>>> >> >>> >>> all
>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>> framework than
>>>> >> >>> >>> MRv1.
>>>> >> >>> >>> However, the test result shows some differences:
>>>> >> >>> >>>
>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>> >> >>> >>>
>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>> >> >>> >>> - MRv1: 193 sec
>>>> >> >>> >>> - Yarn: 69 sec
>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>> >> >>> >>>
>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>> >> >>> >>> - MRv1: 451 sec
>>>> >> >>> >>> - Yarn: 1136 sec
>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>> >> >>> >>>
>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>> that Yarn
>>>> >> >>> >>> is
>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>>> >> >>> >>> phase.
>>>> >> >>> >>>
>>>> >> >>> >>> Here I have two questions:
>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>> >> >>> >>> materials?
>>>> >> >>> >>>
>>>> >> >>> >>> Thanks!
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >> --
>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>> >> >>> >> Product Manager at PDVSA
>>>> >> >>> >> http://about.me/marcosortiz
>>>> >> >>> >>
>>>> >> >>> >
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> Harsh J
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Harsh J
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

It looks like many of your reduce tasks were killed.  Do you know why?
 Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
MR1 with JVM reuse turned off.

-Sandy


On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <ji...@gmail.com>wrote:

> The Terasort output for MR2 is as follows.
>
> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=456102049355
>                 FILE: Number of bytes written=897246250517
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000851200
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=32131
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=224
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=20
>                 Launched map tasks=7601
>                 Launched reduce tasks=132
>                 Data-local map tasks=7591
>                 Rack-local map tasks=10
>                 Total time spent by all maps in occupied slots
> (ms)=1696141311
>                 Total time spent by all reduces in occupied slots
> (ms)=2664045096
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440486356802
>                 Input split bytes=851200
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440486356802
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =851200
>                 Failed Shuffles=61
>                 Merged Map outputs=851200
>                 GC time elapsed (ms)=4215666
>                 CPU time spent (ms)=192433000
>                 Physical memory (bytes) snapshot=3349356380160
>                 Virtual memory (bytes) snapshot=9665208745984
>                 Total committed heap usage (bytes)=3636854259712
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=4
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
> Thanks,
>
> John
>
>
>
> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Hi,
>>
>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
>> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
>> cannot really believe it.
>>
>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>> configurations are as follows.
>>
>>         mapreduce.map.java.opts = "-Xmx512m";
>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>         mapreduce.map.memory.mb = "768";
>>         mapreduce.reduce.memory.mb = "2048";
>>
>>         yarn.scheduler.minimum-allocation-mb = "256";
>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>         yarn.nodemanager.resource.memory-mb = "12288";
>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>
>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>         mapreduce.task.io.sort.factor = "48";
>>         mapreduce.task.io.sort.mb = "200";
>>         mapreduce.map.speculative = "true";
>>         mapreduce.reduce.speculative = "true";
>>         mapreduce.framework.name = "yarn";
>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>         mapreduce.map.cpu.vcores = "1";
>>         mapreduce.reduce.cpu.vcores = "2";
>>
>>         mapreduce.job.jvm.numtasks = "20";
>>         mapreduce.map.output.compress = "true";
>>         mapreduce.map.output.compress.codec =
>> "org.apache.hadoop.io.compress.SnappyCodec";
>>
>>         yarn.resourcemanager.client.thread-count = "64";
>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>         yarn.resourcemanager.scheduler.class =
>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>> "org.apache.hadoop.mapred.ShuffleHandler";
>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>         yarn.nodemanager.container-executor.class =
>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>         yarn.nodemanager.container-manager.thread-count = "64";
>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>
>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
>> configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
>> around 90 minutes in MR2.
>>
>> Does anyone know what is wrong here? Or do I need some special
>> configurations for terasort to work better in MR2?
>>
>> Thanks in advance,
>>
>> John
>>
>>
>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:
>>
>>> Sam,
>>> I think your cluster is too small for any meaningful conclusions to be
>>> made.
>>>
>>>
>>> Sent from a remote device. Please excuse any typos...
>>>
>>> Mike Segel
>>>
>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Hi Harsh,
>>>
>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>> cluster improved a lot after increasing the reducer
>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>> questions about the way of Yarn to execute MRv1 job:
>>>
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>>> like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>> job?
>>>
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>> of physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?
>>>
>>> Thanks as always!
>>>
>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>
>>>> Hi Sam,
>>>>
>>>> > - How to know the container number? Why you say it will be 22
>>>> containers due to a 22 GB memory?
>>>>
>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>> runs the job, additionally. This is tunable using the properties
>>>> Sandy's mentioned in his post.
>>>>
>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>> assigned to containers?
>>>>
>>>> This is a general question. You may use the same process you took to
>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>> (if not the memory). Either increase memory requests from jobs, to
>>>> lower # of concurrent containers at a given time (runtime change), or
>>>> lower NM's published memory resources to control the same (config
>>>> change).
>>>>
>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>> 'mapreduce.map.sort.spill.percent'
>>>>
>>>> Yes, all of these properties will still work. Old properties specific
>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>> name) will not apply anymore.
>>>>
>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>>>> > Hi Harsh,
>>>> >
>>>> > According to above suggestions, I removed the duplication of setting,
>>>> and
>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>> the
>>>> > efficiency improved about 18%.  I have questions:
>>>> >
>>>> > - How to know the container number? Why you say it will be 22
>>>> containers due
>>>> > to a 22 GB memory?
>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>> assigned to
>>>> > containers?
>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>> 'yarn', will
>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>> Like
>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> >
>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>> >>
>>>> >> Hey Sam,
>>>> >>
>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>> >> resource. This could impact much, given the CPU is still the same for
>>>> >> both test runs.
>>>> >>
>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>> >
>>>> >> wrote:
>>>> >> > Hey Sam,
>>>> >> >
>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>> what's
>>>> >> > causing the difference.
>>>> >> >
>>>> >> > A couple observations:
>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>> there
>>>> >> > twice
>>>> >> > with two different values.
>>>> >> >
>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>> default
>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>> getting
>>>> >> > killed for running over resource limits?
>>>> >> >
>>>> >> > -Sandy
>>>> >> >
>>>> >> >
>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>>>> from
>>>> >> >> 33%
>>>> >> >> to 35% as below.
>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>> >> >>
>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>> >> >> (A) core-site.xml
>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>> >> >>
>>>> >> >> (B) hdfs-site.xml
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.replication</name>
>>>> >> >>     <value>1</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.name.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.data.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.block.size</name>
>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>> >> >>     <value>64</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>> >> >>     <value>10</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> (C) mapred-site.xml
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>> >> >>     <description>No description</description>
>>>> >> >>     <final>true</final>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>> >> >>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>> >> >>     <description>No description</description>
>>>> >> >>     <final>true</final>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>> >> >>   <value>-Xmx1000m</value>
>>>> >> >> </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>     <name>mapreduce.framework.name</name>
>>>> >> >>     <value>yarn</value>
>>>> >> >>    </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>> >> >>     <value>8</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>> >> >>     <value>4</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>> >> >>     <value>true</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> (D) yarn-site.xml
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>> >> >>     <value>node1:18025</value>
>>>> >> >>     <description>host is the hostname of the resource manager and
>>>> >> >>     port is the port on which the NodeManagers contact the
>>>> Resource
>>>> >> >> Manager.
>>>> >> >>     </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <description>The address of the RM web
>>>> application.</description>
>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>> >> >>     <value>node1:18088</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>> >> >>     <value>node1:18030</value>
>>>> >> >>     <description>host is the hostname of the resourcemanager and
>>>> port
>>>> >> >> is
>>>> >> >> the port
>>>> >> >>     on which the Applications in the cluster talk to the Resource
>>>> >> >> Manager.
>>>> >> >>     </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>> >> >>     <value>node1:18040</value>
>>>> >> >>     <description>the host is the hostname of the ResourceManager
>>>> and
>>>> >> >> the
>>>> >> >> port is the port on
>>>> >> >>     which the clients can talk to the Resource Manager.
>>>> </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>> >> >>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>> >> >>     <description>the local directories used by the
>>>> >> >> nodemanager</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>> >> >>     <value>0.0.0.0:18050</value>
>>>> >> >>     <description>the nodemanagers bind to this port</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>> >> >>     <value>10240</value>
>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>> >> >> GB</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>> >> >>     <description>directory on hdfs where the application logs are
>>>> moved
>>>> >> >> to
>>>> >> >> </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>    <property>
>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>> >> >> directories</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>> >> >>     <value>mapreduce.shuffle</value>
>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>> Reduce to
>>>> >> >> run </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>> >> >>     <value>64</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>> >> >>     <value>24</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>> >> >>     <value>3</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>> >> >>     <value>22000</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>> >> >>     <value>2.1</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>> >> >>>
>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>> resource
>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum
>>>> by
>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB
>>>> NM
>>>> >> >>> memory resource config) total containers. Do share your configs
>>>> as at
>>>> >> >>> this point none of us can tell what it is.
>>>> >> >>>
>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>>>> not
>>>> >> >>> care about such things :)
>>>> >> >>>
>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>>>> >> >>> wrote:
>>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>>> and
>>>> >> >>> > Yarn.
>>>> >> >>> > But
>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>> did not
>>>> >> >>> > tune
>>>> >> >>> > their configurations at all.  So I got above test results.
>>>> After
>>>> >> >>> > analyzing
>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>> comparison
>>>> >> >>> > again.
>>>> >> >>> >
>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>> compare
>>>> >> >>> > with
>>>> >> >>> > MRv1,
>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>>> >> >>> > phase(terasort test).
>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>> performance
>>>> >> >>> > tunning?
>>>> >> >>> >
>>>> >> >>> > Thanks!
>>>> >> >>> >
>>>> >> >>> >
>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2186@gmail.com
>>>> >
>>>> >> >>> >>
>>>> >> >>> >> Why not to tune the configurations?
>>>> >> >>> >> Both frameworks have many areas to tune:
>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>> >> >>> >>>
>>>> >> >>> >>> Hi Experts,
>>>> >> >>> >>>
>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>>> >> >>> >>> future,
>>>> >> >>> >>> and
>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>> >> >>> >>>
>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>> hardware:
>>>> >> >>> >>> 2
>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>> the
>>>> >> >>> >>> same
>>>> >> >>> >>> env. To
>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>> >> >>> >>> configurations, but
>>>> >> >>> >>> use the default configuration values.
>>>> >> >>> >>>
>>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1,
>>>> if
>>>> >> >>> >>> they
>>>> >> >>> >>> all
>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>> framework than
>>>> >> >>> >>> MRv1.
>>>> >> >>> >>> However, the test result shows some differences:
>>>> >> >>> >>>
>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>> >> >>> >>>
>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>> >> >>> >>> - MRv1: 193 sec
>>>> >> >>> >>> - Yarn: 69 sec
>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>> >> >>> >>>
>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>> >> >>> >>> - MRv1: 451 sec
>>>> >> >>> >>> - Yarn: 1136 sec
>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>> >> >>> >>>
>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>> that Yarn
>>>> >> >>> >>> is
>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>>> >> >>> >>> phase.
>>>> >> >>> >>>
>>>> >> >>> >>> Here I have two questions:
>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>> >> >>> >>> materials?
>>>> >> >>> >>>
>>>> >> >>> >>> Thanks!
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >> --
>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>> >> >>> >> Product Manager at PDVSA
>>>> >> >>> >> http://about.me/marcosortiz
>>>> >> >>> >>
>>>> >> >>> >
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> Harsh J
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Harsh J
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

It looks like many of your reduce tasks were killed.  Do you know why?
 Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
MR1 with JVM reuse turned off.

-Sandy


On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <ji...@gmail.com>wrote:

> The Terasort output for MR2 is as follows.
>
> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=456102049355
>                 FILE: Number of bytes written=897246250517
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000851200
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=32131
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=224
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=20
>                 Launched map tasks=7601
>                 Launched reduce tasks=132
>                 Data-local map tasks=7591
>                 Rack-local map tasks=10
>                 Total time spent by all maps in occupied slots
> (ms)=1696141311
>                 Total time spent by all reduces in occupied slots
> (ms)=2664045096
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440486356802
>                 Input split bytes=851200
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440486356802
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =851200
>                 Failed Shuffles=61
>                 Merged Map outputs=851200
>                 GC time elapsed (ms)=4215666
>                 CPU time spent (ms)=192433000
>                 Physical memory (bytes) snapshot=3349356380160
>                 Virtual memory (bytes) snapshot=9665208745984
>                 Total committed heap usage (bytes)=3636854259712
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=4
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
> Thanks,
>
> John
>
>
>
> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Hi,
>>
>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
>> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
>> cannot really believe it.
>>
>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>> configurations are as follows.
>>
>>         mapreduce.map.java.opts = "-Xmx512m";
>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>         mapreduce.map.memory.mb = "768";
>>         mapreduce.reduce.memory.mb = "2048";
>>
>>         yarn.scheduler.minimum-allocation-mb = "256";
>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>         yarn.nodemanager.resource.memory-mb = "12288";
>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>
>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>         mapreduce.task.io.sort.factor = "48";
>>         mapreduce.task.io.sort.mb = "200";
>>         mapreduce.map.speculative = "true";
>>         mapreduce.reduce.speculative = "true";
>>         mapreduce.framework.name = "yarn";
>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>         mapreduce.map.cpu.vcores = "1";
>>         mapreduce.reduce.cpu.vcores = "2";
>>
>>         mapreduce.job.jvm.numtasks = "20";
>>         mapreduce.map.output.compress = "true";
>>         mapreduce.map.output.compress.codec =
>> "org.apache.hadoop.io.compress.SnappyCodec";
>>
>>         yarn.resourcemanager.client.thread-count = "64";
>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>         yarn.resourcemanager.scheduler.class =
>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>> "org.apache.hadoop.mapred.ShuffleHandler";
>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>         yarn.nodemanager.container-executor.class =
>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>         yarn.nodemanager.container-manager.thread-count = "64";
>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>
>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
>> configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
>> around 90 minutes in MR2.
>>
>> Does anyone know what is wrong here? Or do I need some special
>> configurations for terasort to work better in MR2?
>>
>> Thanks in advance,
>>
>> John
>>
>>
>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:
>>
>>> Sam,
>>> I think your cluster is too small for any meaningful conclusions to be
>>> made.
>>>
>>>
>>> Sent from a remote device. Please excuse any typos...
>>>
>>> Mike Segel
>>>
>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Hi Harsh,
>>>
>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>> cluster improved a lot after increasing the reducer
>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>> questions about the way of Yarn to execute MRv1 job:
>>>
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>>> like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>> job?
>>>
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>> of physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?
>>>
>>> Thanks as always!
>>>
>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>
>>>> Hi Sam,
>>>>
>>>> > - How to know the container number? Why you say it will be 22
>>>> containers due to a 22 GB memory?
>>>>
>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>> runs the job, additionally. This is tunable using the properties
>>>> Sandy's mentioned in his post.
>>>>
>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>> assigned to containers?
>>>>
>>>> This is a general question. You may use the same process you took to
>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>> (if not the memory). Either increase memory requests from jobs, to
>>>> lower # of concurrent containers at a given time (runtime change), or
>>>> lower NM's published memory resources to control the same (config
>>>> change).
>>>>
>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>> 'mapreduce.map.sort.spill.percent'
>>>>
>>>> Yes, all of these properties will still work. Old properties specific
>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>> name) will not apply anymore.
>>>>
>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>>>> > Hi Harsh,
>>>> >
>>>> > According to above suggestions, I removed the duplication of setting,
>>>> and
>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>> the
>>>> > efficiency improved about 18%.  I have questions:
>>>> >
>>>> > - How to know the container number? Why you say it will be 22
>>>> containers due
>>>> > to a 22 GB memory?
>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>> assigned to
>>>> > containers?
>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>> 'yarn', will
>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>> Like
>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> >
>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>> >>
>>>> >> Hey Sam,
>>>> >>
>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>> >> resource. This could impact much, given the CPU is still the same for
>>>> >> both test runs.
>>>> >>
>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>> >
>>>> >> wrote:
>>>> >> > Hey Sam,
>>>> >> >
>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>> what's
>>>> >> > causing the difference.
>>>> >> >
>>>> >> > A couple observations:
>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>> there
>>>> >> > twice
>>>> >> > with two different values.
>>>> >> >
>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>> default
>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>> getting
>>>> >> > killed for running over resource limits?
>>>> >> >
>>>> >> > -Sandy
>>>> >> >
>>>> >> >
>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>>>> from
>>>> >> >> 33%
>>>> >> >> to 35% as below.
>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>> >> >>
>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>> >> >> (A) core-site.xml
>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>> >> >>
>>>> >> >> (B) hdfs-site.xml
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.replication</name>
>>>> >> >>     <value>1</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.name.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.data.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.block.size</name>
>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>> >> >>     <value>64</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>> >> >>     <value>10</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> (C) mapred-site.xml
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>> >> >>     <description>No description</description>
>>>> >> >>     <final>true</final>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>> >> >>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>> >> >>     <description>No description</description>
>>>> >> >>     <final>true</final>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>> >> >>   <value>-Xmx1000m</value>
>>>> >> >> </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>     <name>mapreduce.framework.name</name>
>>>> >> >>     <value>yarn</value>
>>>> >> >>    </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>> >> >>     <value>8</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>> >> >>     <value>4</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>> >> >>     <value>true</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> (D) yarn-site.xml
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>> >> >>     <value>node1:18025</value>
>>>> >> >>     <description>host is the hostname of the resource manager and
>>>> >> >>     port is the port on which the NodeManagers contact the
>>>> Resource
>>>> >> >> Manager.
>>>> >> >>     </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <description>The address of the RM web
>>>> application.</description>
>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>> >> >>     <value>node1:18088</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>> >> >>     <value>node1:18030</value>
>>>> >> >>     <description>host is the hostname of the resourcemanager and
>>>> port
>>>> >> >> is
>>>> >> >> the port
>>>> >> >>     on which the Applications in the cluster talk to the Resource
>>>> >> >> Manager.
>>>> >> >>     </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>> >> >>     <value>node1:18040</value>
>>>> >> >>     <description>the host is the hostname of the ResourceManager
>>>> and
>>>> >> >> the
>>>> >> >> port is the port on
>>>> >> >>     which the clients can talk to the Resource Manager.
>>>> </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>> >> >>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>> >> >>     <description>the local directories used by the
>>>> >> >> nodemanager</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>> >> >>     <value>0.0.0.0:18050</value>
>>>> >> >>     <description>the nodemanagers bind to this port</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>> >> >>     <value>10240</value>
>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>> >> >> GB</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>> >> >>     <description>directory on hdfs where the application logs are
>>>> moved
>>>> >> >> to
>>>> >> >> </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>    <property>
>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>> >> >> directories</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>> >> >>     <value>mapreduce.shuffle</value>
>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>> Reduce to
>>>> >> >> run </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>> >> >>     <value>64</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>> >> >>     <value>24</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>> >> >>     <value>3</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>> >> >>     <value>22000</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>> >> >>     <value>2.1</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>> >> >>>
>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>> resource
>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum
>>>> by
>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB
>>>> NM
>>>> >> >>> memory resource config) total containers. Do share your configs
>>>> as at
>>>> >> >>> this point none of us can tell what it is.
>>>> >> >>>
>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>>>> not
>>>> >> >>> care about such things :)
>>>> >> >>>
>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>>>> >> >>> wrote:
>>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>>> and
>>>> >> >>> > Yarn.
>>>> >> >>> > But
>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>> did not
>>>> >> >>> > tune
>>>> >> >>> > their configurations at all.  So I got above test results.
>>>> After
>>>> >> >>> > analyzing
>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>> comparison
>>>> >> >>> > again.
>>>> >> >>> >
>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>> compare
>>>> >> >>> > with
>>>> >> >>> > MRv1,
>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>>> >> >>> > phase(terasort test).
>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>> performance
>>>> >> >>> > tunning?
>>>> >> >>> >
>>>> >> >>> > Thanks!
>>>> >> >>> >
>>>> >> >>> >
>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2186@gmail.com
>>>> >
>>>> >> >>> >>
>>>> >> >>> >> Why not to tune the configurations?
>>>> >> >>> >> Both frameworks have many areas to tune:
>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>> >> >>> >>>
>>>> >> >>> >>> Hi Experts,
>>>> >> >>> >>>
>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>>> >> >>> >>> future,
>>>> >> >>> >>> and
>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>> >> >>> >>>
>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>> hardware:
>>>> >> >>> >>> 2
>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>> the
>>>> >> >>> >>> same
>>>> >> >>> >>> env. To
>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>> >> >>> >>> configurations, but
>>>> >> >>> >>> use the default configuration values.
>>>> >> >>> >>>
>>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1,
>>>> if
>>>> >> >>> >>> they
>>>> >> >>> >>> all
>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>> framework than
>>>> >> >>> >>> MRv1.
>>>> >> >>> >>> However, the test result shows some differences:
>>>> >> >>> >>>
>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>> >> >>> >>>
>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>> >> >>> >>> - MRv1: 193 sec
>>>> >> >>> >>> - Yarn: 69 sec
>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>> >> >>> >>>
>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>> >> >>> >>> - MRv1: 451 sec
>>>> >> >>> >>> - Yarn: 1136 sec
>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>> >> >>> >>>
>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>> that Yarn
>>>> >> >>> >>> is
>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>>> >> >>> >>> phase.
>>>> >> >>> >>>
>>>> >> >>> >>> Here I have two questions:
>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>> >> >>> >>> materials?
>>>> >> >>> >>>
>>>> >> >>> >>> Thanks!
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >> --
>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>> >> >>> >> Product Manager at PDVSA
>>>> >> >>> >> http://about.me/marcosortiz
>>>> >> >>> >>
>>>> >> >>> >
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> Harsh J
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Harsh J
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Sandy Ryza <sa...@cloudera.com>.

It looks like many of your reduce tasks were killed.  Do you know why?
 Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
MR1 with JVM reuse turned off.

-Sandy


On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <ji...@gmail.com>wrote:

> The Terasort output for MR2 is as follows.
>
> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
> Counters: 46
>         File System Counters
>                 FILE: Number of bytes read=456102049355
>                 FILE: Number of bytes written=897246250517
>                 FILE: Number of read operations=0
>                 FILE: Number of large read operations=0
>                 FILE: Number of write operations=0
>                 HDFS: Number of bytes read=1000000851200
>                 HDFS: Number of bytes written=1000000000000
>                 HDFS: Number of read operations=32131
>                 HDFS: Number of large read operations=0
>                 HDFS: Number of write operations=224
>         Job Counters
>                 Killed map tasks=1
>                 Killed reduce tasks=20
>                 Launched map tasks=7601
>                 Launched reduce tasks=132
>                 Data-local map tasks=7591
>                 Rack-local map tasks=10
>                 Total time spent by all maps in occupied slots
> (ms)=1696141311
>                 Total time spent by all reduces in occupied slots
> (ms)=2664045096
>         Map-Reduce Framework
>                 Map input records=10000000000
>                 Map output records=10000000000
>                 Map output bytes=1020000000000
>                 Map output materialized bytes=440486356802
>                 Input split bytes=851200
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=10000000000
>                 Reduce shuffle bytes=440486356802
>                 Reduce input records=10000000000
>                 Reduce output records=10000000000
>                 Spilled Records=20000000000
>                 Shuffled Maps =851200
>                 Failed Shuffles=61
>                 Merged Map outputs=851200
>                 GC time elapsed (ms)=4215666
>                 CPU time spent (ms)=192433000
>                 Physical memory (bytes) snapshot=3349356380160
>                 Virtual memory (bytes) snapshot=9665208745984
>                 Total committed heap usage (bytes)=3636854259712
>         Shuffle Errors
>                 BAD_ID=0
>                 CONNECTION=0
>                 IO_ERROR=4
>                 WRONG_LENGTH=0
>                 WRONG_MAP=0
>                 WRONG_REDUCE=0
>         File Input Format Counters
>                 Bytes Read=1000000000000
>         File Output Format Counters
>                 Bytes Written=1000000000000
>
> Thanks,
>
> John
>
>
>
> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <ji...@gmail.com>wrote:
>
>> Hi,
>>
>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
>> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
>> cannot really believe it.
>>
>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>> configurations are as follows.
>>
>>         mapreduce.map.java.opts = "-Xmx512m";
>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>         mapreduce.map.memory.mb = "768";
>>         mapreduce.reduce.memory.mb = "2048";
>>
>>         yarn.scheduler.minimum-allocation-mb = "256";
>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>         yarn.nodemanager.resource.memory-mb = "12288";
>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>
>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>         mapreduce.task.io.sort.factor = "48";
>>         mapreduce.task.io.sort.mb = "200";
>>         mapreduce.map.speculative = "true";
>>         mapreduce.reduce.speculative = "true";
>>         mapreduce.framework.name = "yarn";
>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>         mapreduce.map.cpu.vcores = "1";
>>         mapreduce.reduce.cpu.vcores = "2";
>>
>>         mapreduce.job.jvm.numtasks = "20";
>>         mapreduce.map.output.compress = "true";
>>         mapreduce.map.output.compress.codec =
>> "org.apache.hadoop.io.compress.SnappyCodec";
>>
>>         yarn.resourcemanager.client.thread-count = "64";
>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>         yarn.resourcemanager.scheduler.class =
>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>> "org.apache.hadoop.mapred.ShuffleHandler";
>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>         yarn.nodemanager.container-executor.class =
>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>         yarn.nodemanager.container-manager.thread-count = "64";
>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>
>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
>> configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
>> around 90 minutes in MR2.
>>
>> Does anyone know what is wrong here? Or do I need some special
>> configurations for terasort to work better in MR2?
>>
>> Thanks in advance,
>>
>> John
>>
>>
>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:
>>
>>> Sam,
>>> I think your cluster is too small for any meaningful conclusions to be
>>> made.
>>>
>>>
>>> Sent from a remote device. Please excuse any typos...
>>>
>>> Mike Segel
>>>
>>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Hi Harsh,
>>>
>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>> cluster improved a lot after increasing the reducer
>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>> questions about the way of Yarn to execute MRv1 job:
>>>
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>>> like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>> job?
>>>
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>> of physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?
>>>
>>> Thanks as always!
>>>
>>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>>
>>>> Hi Sam,
>>>>
>>>> > - How to know the container number? Why you say it will be 22
>>>> containers due to a 22 GB memory?
>>>>
>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>> runs the job, additionally. This is tunable using the properties
>>>> Sandy's mentioned in his post.
>>>>
>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>> assigned to containers?
>>>>
>>>> This is a general question. You may use the same process you took to
>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>> (if not the memory). Either increase memory requests from jobs, to
>>>> lower # of concurrent containers at a given time (runtime change), or
>>>> lower NM's published memory resources to control the same (config
>>>> change).
>>>>
>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>> 'mapreduce.map.sort.spill.percent'
>>>>
>>>> Yes, all of these properties will still work. Old properties specific
>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>> name) will not apply anymore.
>>>>
>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>>>> > Hi Harsh,
>>>> >
>>>> > According to above suggestions, I removed the duplication of setting,
>>>> and
>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>> the
>>>> > efficiency improved about 18%.  I have questions:
>>>> >
>>>> > - How to know the container number? Why you say it will be 22
>>>> containers due
>>>> > to a 22 GB memory?
>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>> assigned to
>>>> > containers?
>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>> 'yarn', will
>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>> Like
>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> >
>>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>>> >>
>>>> >> Hey Sam,
>>>> >>
>>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>> >> resource. This could impact much, given the CPU is still the same for
>>>> >> both test runs.
>>>> >>
>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>> >
>>>> >> wrote:
>>>> >> > Hey Sam,
>>>> >> >
>>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>>> what's
>>>> >> > causing the difference.
>>>> >> >
>>>> >> > A couple observations:
>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>>> there
>>>> >> > twice
>>>> >> > with two different values.
>>>> >> >
>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>>> default
>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>>> getting
>>>> >> > killed for running over resource limits?
>>>> >> >
>>>> >> > -Sandy
>>>> >> >
>>>> >> >
>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>>>> from
>>>> >> >> 33%
>>>> >> >> to 35% as below.
>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>> >> >>
>>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>>> >> >> (A) core-site.xml
>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>> >> >>
>>>> >> >> (B) hdfs-site.xml
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.replication</name>
>>>> >> >>     <value>1</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.name.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.data.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.block.size</name>
>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>> >> >>     <value>64</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>> >> >>     <value>10</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> (C) mapred-site.xml
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>> >> >>     <description>No description</description>
>>>> >> >>     <final>true</final>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>> >> >>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>> >> >>     <description>No description</description>
>>>> >> >>     <final>true</final>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>> >> >>   <value>-Xmx1000m</value>
>>>> >> >> </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>     <name>mapreduce.framework.name</name>
>>>> >> >>     <value>yarn</value>
>>>> >> >>    </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>> >> >>     <value>8</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>> >> >>     <value>4</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>> >> >>     <value>true</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> (D) yarn-site.xml
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>> >> >>     <value>node1:18025</value>
>>>> >> >>     <description>host is the hostname of the resource manager and
>>>> >> >>     port is the port on which the NodeManagers contact the
>>>> Resource
>>>> >> >> Manager.
>>>> >> >>     </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <description>The address of the RM web
>>>> application.</description>
>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>> >> >>     <value>node1:18088</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>> >> >>     <value>node1:18030</value>
>>>> >> >>     <description>host is the hostname of the resourcemanager and
>>>> port
>>>> >> >> is
>>>> >> >> the port
>>>> >> >>     on which the Applications in the cluster talk to the Resource
>>>> >> >> Manager.
>>>> >> >>     </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>> >> >>     <value>node1:18040</value>
>>>> >> >>     <description>the host is the hostname of the ResourceManager
>>>> and
>>>> >> >> the
>>>> >> >> port is the port on
>>>> >> >>     which the clients can talk to the Resource Manager.
>>>> </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>> >> >>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>> >> >>     <description>the local directories used by the
>>>> >> >> nodemanager</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>> >> >>     <value>0.0.0.0:18050</value>
>>>> >> >>     <description>the nodemanagers bind to this port</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>> >> >>     <value>10240</value>
>>>> >> >>     <description>the amount of memory on the NodeManager in
>>>> >> >> GB</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>> >> >>
>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>> >> >>     <description>directory on hdfs where the application logs are
>>>> moved
>>>> >> >> to
>>>> >> >> </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>    <property>
>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>> >> >>     <description>the directories used by Nodemanagers as log
>>>> >> >> directories</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>> >> >>     <value>mapreduce.shuffle</value>
>>>> >> >>     <description>shuffle service that needs to be set for Map
>>>> Reduce to
>>>> >> >> run </description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>> >> >>     <value>64</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>> >> >>     <value>24</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >> <property>
>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>> >> >>     <value>3</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>> >> >>     <value>22000</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>  <property>
>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>> >> >>     <value>2.1</value>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>>> >> >>>
>>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>>> resource
>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum
>>>> by
>>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB
>>>> NM
>>>> >> >>> memory resource config) total containers. Do share your configs
>>>> as at
>>>> >> >>> this point none of us can tell what it is.
>>>> >> >>>
>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>>>> not
>>>> >> >>> care about such things :)
>>>> >> >>>
>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>>>> >> >>> wrote:
>>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>>> and
>>>> >> >>> > Yarn.
>>>> >> >>> > But
>>>> >> >>> > they have many differences, and to be fair for comparison I
>>>> did not
>>>> >> >>> > tune
>>>> >> >>> > their configurations at all.  So I got above test results.
>>>> After
>>>> >> >>> > analyzing
>>>> >> >>> > the test result, no doubt, I will configure them and do
>>>> comparison
>>>> >> >>> > again.
>>>> >> >>> >
>>>> >> >>> > Do you have any idea on current test result? I think, to
>>>> compare
>>>> >> >>> > with
>>>> >> >>> > MRv1,
>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>>> >> >>> > phase(terasort test).
>>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>>> performance
>>>> >> >>> > tunning?
>>>> >> >>> >
>>>> >> >>> > Thanks!
>>>> >> >>> >
>>>> >> >>> >
>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2186@gmail.com
>>>> >
>>>> >> >>> >>
>>>> >> >>> >> Why not to tune the configurations?
>>>> >> >>> >> Both frameworks have many areas to tune:
>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>>> >> >>> >>>
>>>> >> >>> >>> Hi Experts,
>>>> >> >>> >>>
>>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>>> >> >>> >>> future,
>>>> >> >>> >>> and
>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>> >> >>> >>>
>>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>>> hardware:
>>>> >> >>> >>> 2
>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on
>>>> the
>>>> >> >>> >>> same
>>>> >> >>> >>> env. To
>>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>>> >> >>> >>> configurations, but
>>>> >> >>> >>> use the default configuration values.
>>>> >> >>> >>>
>>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1,
>>>> if
>>>> >> >>> >>> they
>>>> >> >>> >>> all
>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>> framework than
>>>> >> >>> >>> MRv1.
>>>> >> >>> >>> However, the test result shows some differences:
>>>> >> >>> >>>
>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>> >> >>> >>>
>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>> >> >>> >>> - MRv1: 193 sec
>>>> >> >>> >>> - Yarn: 69 sec
>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>> >> >>> >>>
>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>> >> >>> >>> - MRv1: 451 sec
>>>> >> >>> >>> - Yarn: 1136 sec
>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>> >> >>> >>>
>>>> >> >>> >>> After a fast analysis, I think the direct cause might be
>>>> that Yarn
>>>> >> >>> >>> is
>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>>> >> >>> >>> phase.
>>>> >> >>> >>>
>>>> >> >>> >>> Here I have two questions:
>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>>> >> >>> >>> materials?
>>>> >> >>> >>>
>>>> >> >>> >>> Thanks!
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >>
>>>> >> >>> >> --
>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>> >> >>> >> Product Manager at PDVSA
>>>> >> >>> >> http://about.me/marcosortiz
>>>> >> >>> >>
>>>> >> >>> >
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> Harsh J
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Harsh J
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

The Terasort output for MR2 is as follows.

2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
Counters: 46
        File System Counters
                FILE: Number of bytes read=456102049355
                FILE: Number of bytes written=897246250517
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000851200
                HDFS: Number of bytes written=1000000000000
                HDFS: Number of read operations=32131
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=224
        Job Counters
                Killed map tasks=1
                Killed reduce tasks=20
                Launched map tasks=7601
                Launched reduce tasks=132
                Data-local map tasks=7591
                Rack-local map tasks=10
                Total time spent by all maps in occupied slots
(ms)=1696141311
                Total time spent by all reduces in occupied slots
(ms)=2664045096
        Map-Reduce Framework
                Map input records=10000000000
                Map output records=10000000000
                Map output bytes=1020000000000
                Map output materialized bytes=440486356802
                Input split bytes=851200
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000000
                Reduce shuffle bytes=440486356802
                Reduce input records=10000000000
                Reduce output records=10000000000
                Spilled Records=20000000000
                Shuffled Maps =851200
                Failed Shuffles=61
                Merged Map outputs=851200
                GC time elapsed (ms)=4215666
                CPU time spent (ms)=192433000
                Physical memory (bytes) snapshot=3349356380160
                Virtual memory (bytes) snapshot=9665208745984
                Total committed heap usage (bytes)=3636854259712
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=4
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1000000000000
        File Output Format Counters
                Bytes Written=1000000000000

Thanks,

John



On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <ji...@gmail.com>wrote:

> Hi,
>
> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
> cannot really believe it.
>
> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
> configurations are as follows.
>
>         mapreduce.map.java.opts = "-Xmx512m";
>         mapreduce.reduce.java.opts = "-Xmx1536m";
>         mapreduce.map.memory.mb = "768";
>         mapreduce.reduce.memory.mb = "2048";
>
>         yarn.scheduler.minimum-allocation-mb = "256";
>         yarn.scheduler.maximum-allocation-mb = "8192";
>         yarn.nodemanager.resource.memory-mb = "12288";
>         yarn.nodemanager.resource.cpu-vcores = "16";
>
>         mapreduce.reduce.shuffle.parallelcopies = "20";
>         mapreduce.task.io.sort.factor = "48";
>         mapreduce.task.io.sort.mb = "200";
>         mapreduce.map.speculative = "true";
>         mapreduce.reduce.speculative = "true";
>         mapreduce.framework.name = "yarn";
>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>         mapreduce.map.cpu.vcores = "1";
>         mapreduce.reduce.cpu.vcores = "2";
>
>         mapreduce.job.jvm.numtasks = "20";
>         mapreduce.map.output.compress = "true";
>         mapreduce.map.output.compress.codec =
> "org.apache.hadoop.io.compress.SnappyCodec";
>
>         yarn.resourcemanager.client.thread-count = "64";
>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>         yarn.resourcemanager.scheduler.class =
> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
> "org.apache.hadoop.mapred.ShuffleHandler";
>         yarn.nodemanager.vmem-pmem-ratio = "5";
>         yarn.nodemanager.container-executor.class =
> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>         yarn.nodemanager.container-manager.thread-count = "64";
>         yarn.nodemanager.localizer.client.thread-count = "20";
>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>
> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
> configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
> around 90 minutes in MR2.
>
> Does anyone know what is wrong here? Or do I need some special
> configurations for terasort to work better in MR2?
>
> Thanks in advance,
>
> John
>
>
> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Sam,
>> I think your cluster is too small for any meaningful conclusions to be
>> made.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Harsh,
>>
>> Thanks for your detailed response! Now, the efficiency of my Yarn cluster
>> improved a lot after increasing the reducer number(mapreduce.job.reduces)
>> in mapred-site.xml. But I still have some questions about the way of Yarn
>> to execute MRv1 job:
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>> yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?
>>
>> Thanks as always!
>>
>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>
>>> Hi Sam,
>>>
>>> > - How to know the container number? Why you say it will be 22
>>> containers due to a 22 GB memory?
>>>
>>> The MR2's default configuration requests 1 GB resource each for Map
>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>> runs the job, additionally. This is tunable using the properties
>>> Sandy's mentioned in his post.
>>>
>>> > - My machine has 32 GB memory, how many memory is proper to be
>>> assigned to containers?
>>>
>>> This is a general question. You may use the same process you took to
>>> decide optimal number of slots in MR1 to decide this here. Every
>>> container is a new JVM, and you're limited by the CPUs you have there
>>> (if not the memory). Either increase memory requests from jobs, to
>>> lower # of concurrent containers at a given time (runtime change), or
>>> lower NM's published memory resources to control the same (config
>>> change).
>>>
>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>> 'mapreduce.map.sort.spill.percent'
>>>
>>> Yes, all of these properties will still work. Old properties specific
>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>> name) will not apply anymore.
>>>
>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>>> > Hi Harsh,
>>> >
>>> > According to above suggestions, I removed the duplication of setting,
>>> and
>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
>>> > efficiency improved about 18%.  I have questions:
>>> >
>>> > - How to know the container number? Why you say it will be 22
>>> containers due
>>> > to a 22 GB memory?
>>> > - My machine has 32 GB memory, how many memory is proper to be
>>> assigned to
>>> > containers?
>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>> 'yarn', will
>>> > other parameters for mapred-site.xml still work in yarn framework? Like
>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>> >>
>>> >> Hey Sam,
>>> >>
>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>> >> resource. This could impact much, given the CPU is still the same for
>>> >> both test runs.
>>> >>
>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sa...@cloudera.com>
>>> >> wrote:
>>> >> > Hey Sam,
>>> >> >
>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>> what's
>>> >> > causing the difference.
>>> >> >
>>> >> > A couple observations:
>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>> there
>>> >> > twice
>>> >> > with two different values.
>>> >> >
>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>> default
>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>> getting
>>> >> > killed for running over resource limits?
>>> >> >
>>> >> > -Sandy
>>> >> >
>>> >> >
>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>>> from
>>> >> >> 33%
>>> >> >> to 35% as below.
>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>> >> >>
>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>> >> >> (A) core-site.xml
>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>> >> >>
>>> >> >> (B) hdfs-site.xml
>>> >> >>   <property>
>>> >> >>     <name>dfs.replication</name>
>>> >> >>     <value>1</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.name.dir</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.data.dir</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.block.size</name>
>>> >> >>     <value>134217728</value><!-- 128MB -->
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.namenode.handler.count</name>
>>> >> >>     <value>64</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.datanode.handler.count</name>
>>> >> >>     <value>10</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> (C) mapred-site.xml
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>> >> >>     <description>No description</description>
>>> >> >>     <final>true</final>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>> >> >>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>> >> >>     <description>No description</description>
>>> >> >>     <final>true</final>
>>> >> >>   </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>   <name>mapreduce.child.java.opts</name>
>>> >> >>   <value>-Xmx1000m</value>
>>> >> >> </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>     <name>mapreduce.framework.name</name>
>>> >> >>     <value>yarn</value>
>>> >> >>    </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>> >> >>     <value>8</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>> >> >>     <value>4</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>> >> >>     <value>true</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> (D) yarn-site.xml
>>> >> >>  <property>
>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>> >> >>     <value>node1:18025</value>
>>> >> >>     <description>host is the hostname of the resource manager and
>>> >> >>     port is the port on which the NodeManagers contact the Resource
>>> >> >> Manager.
>>> >> >>     </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <description>The address of the RM web
>>> application.</description>
>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>> >> >>     <value>node1:18088</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>> >> >>     <value>node1:18030</value>
>>> >> >>     <description>host is the hostname of the resourcemanager and
>>> port
>>> >> >> is
>>> >> >> the port
>>> >> >>     on which the Applications in the cluster talk to the Resource
>>> >> >> Manager.
>>> >> >>     </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.address</name>
>>> >> >>     <value>node1:18040</value>
>>> >> >>     <description>the host is the hostname of the ResourceManager
>>> and
>>> >> >> the
>>> >> >> port is the port on
>>> >> >>     which the clients can talk to the Resource Manager.
>>> </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>> >> >>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>> >> >>     <description>the local directories used by the
>>> >> >> nodemanager</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.address</name>
>>> >> >>     <value>0.0.0.0:18050</value>
>>> >> >>     <description>the nodemanagers bind to this port</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>> >> >>     <value>10240</value>
>>> >> >>     <description>the amount of memory on the NodeManager in
>>> >> >> GB</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>> >> >>     <description>directory on hdfs where the application logs are
>>> moved
>>> >> >> to
>>> >> >> </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>    <property>
>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>> >> >>     <description>the directories used by Nodemanagers as log
>>> >> >> directories</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>> >> >>     <value>mapreduce.shuffle</value>
>>> >> >>     <description>shuffle service that needs to be set for Map
>>> Reduce to
>>> >> >> run </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>> >> >>     <value>64</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>> >> >>     <value>24</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>> >> >>     <value>3</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>> >> >>     <value>22000</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>> >> >>     <value>2.1</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>> >> >>>
>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>> resource
>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum by
>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB NM
>>> >> >>> memory resource config) total containers. Do share your configs
>>> as at
>>> >> >>> this point none of us can tell what it is.
>>> >> >>>
>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>>> not
>>> >> >>> care about such things :)
>>> >> >>>
>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>> and
>>> >> >>> > Yarn.
>>> >> >>> > But
>>> >> >>> > they have many differences, and to be fair for comparison I did
>>> not
>>> >> >>> > tune
>>> >> >>> > their configurations at all.  So I got above test results. After
>>> >> >>> > analyzing
>>> >> >>> > the test result, no doubt, I will configure them and do
>>> comparison
>>> >> >>> > again.
>>> >> >>> >
>>> >> >>> > Do you have any idea on current test result? I think, to compare
>>> >> >>> > with
>>> >> >>> > MRv1,
>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>> >> >>> > phase(terasort test).
>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>> performance
>>> >> >>> > tunning?
>>> >> >>> >
>>> >> >>> > Thanks!
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <ma...@gmail.com>
>>> >> >>> >>
>>> >> >>> >> Why not to tune the configurations?
>>> >> >>> >> Both frameworks have many areas to tune:
>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>> >> >>> >>>
>>> >> >>> >>> Hi Experts,
>>> >> >>> >>>
>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>> >> >>> >>> future,
>>> >> >>> >>> and
>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>> >> >>> >>>
>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>> hardware:
>>> >> >>> >>> 2
>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the
>>> >> >>> >>> same
>>> >> >>> >>> env. To
>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>> >> >>> >>> configurations, but
>>> >> >>> >>> use the default configuration values.
>>> >> >>> >>>
>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1, if
>>> >> >>> >>> they
>>> >> >>> >>> all
>>> >> >>> >>> use default configuration, because Yarn is a better framework
>>> than
>>> >> >>> >>> MRv1.
>>> >> >>> >>> However, the test result shows some differences:
>>> >> >>> >>>
>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>> >> >>> >>>
>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>> >> >>> >>> - MRv1: 193 sec
>>> >> >>> >>> - Yarn: 69 sec
>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>> >> >>> >>>
>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>> >> >>> >>> - MRv1: 451 sec
>>> >> >>> >>> - Yarn: 1136 sec
>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>> >> >>> >>>
>>> >> >>> >>> After a fast analysis, I think the direct cause might be that
>>> Yarn
>>> >> >>> >>> is
>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>> >> >>> >>> phase.
>>> >> >>> >>>
>>> >> >>> >>> Here I have two questions:
>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>> >> >>> >>> materials?
>>> >> >>> >>>
>>> >> >>> >>> Thanks!
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> --
>>> >> >>> >> Marcos Ortiz Valmaseda
>>> >> >>> >> Product Manager at PDVSA
>>> >> >>> >> http://about.me/marcosortiz
>>> >> >>> >>
>>> >> >>> >
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Harsh J
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

The Terasort output for MR2 is as follows.

2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
Counters: 46
        File System Counters
                FILE: Number of bytes read=456102049355
                FILE: Number of bytes written=897246250517
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000851200
                HDFS: Number of bytes written=1000000000000
                HDFS: Number of read operations=32131
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=224
        Job Counters
                Killed map tasks=1
                Killed reduce tasks=20
                Launched map tasks=7601
                Launched reduce tasks=132
                Data-local map tasks=7591
                Rack-local map tasks=10
                Total time spent by all maps in occupied slots
(ms)=1696141311
                Total time spent by all reduces in occupied slots
(ms)=2664045096
        Map-Reduce Framework
                Map input records=10000000000
                Map output records=10000000000
                Map output bytes=1020000000000
                Map output materialized bytes=440486356802
                Input split bytes=851200
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000000
                Reduce shuffle bytes=440486356802
                Reduce input records=10000000000
                Reduce output records=10000000000
                Spilled Records=20000000000
                Shuffled Maps =851200
                Failed Shuffles=61
                Merged Map outputs=851200
                GC time elapsed (ms)=4215666
                CPU time spent (ms)=192433000
                Physical memory (bytes) snapshot=3349356380160
                Virtual memory (bytes) snapshot=9665208745984
                Total committed heap usage (bytes)=3636854259712
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=4
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1000000000000
        File Output Format Counters
                Bytes Written=1000000000000

Thanks,

John



On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <ji...@gmail.com>wrote:

> Hi,
>
> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
> cannot really believe it.
>
> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
> configurations are as follows.
>
>         mapreduce.map.java.opts = "-Xmx512m";
>         mapreduce.reduce.java.opts = "-Xmx1536m";
>         mapreduce.map.memory.mb = "768";
>         mapreduce.reduce.memory.mb = "2048";
>
>         yarn.scheduler.minimum-allocation-mb = "256";
>         yarn.scheduler.maximum-allocation-mb = "8192";
>         yarn.nodemanager.resource.memory-mb = "12288";
>         yarn.nodemanager.resource.cpu-vcores = "16";
>
>         mapreduce.reduce.shuffle.parallelcopies = "20";
>         mapreduce.task.io.sort.factor = "48";
>         mapreduce.task.io.sort.mb = "200";
>         mapreduce.map.speculative = "true";
>         mapreduce.reduce.speculative = "true";
>         mapreduce.framework.name = "yarn";
>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>         mapreduce.map.cpu.vcores = "1";
>         mapreduce.reduce.cpu.vcores = "2";
>
>         mapreduce.job.jvm.numtasks = "20";
>         mapreduce.map.output.compress = "true";
>         mapreduce.map.output.compress.codec =
> "org.apache.hadoop.io.compress.SnappyCodec";
>
>         yarn.resourcemanager.client.thread-count = "64";
>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>         yarn.resourcemanager.scheduler.class =
> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
> "org.apache.hadoop.mapred.ShuffleHandler";
>         yarn.nodemanager.vmem-pmem-ratio = "5";
>         yarn.nodemanager.container-executor.class =
> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>         yarn.nodemanager.container-manager.thread-count = "64";
>         yarn.nodemanager.localizer.client.thread-count = "20";
>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>
> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
> configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
> around 90 minutes in MR2.
>
> Does anyone know what is wrong here? Or do I need some special
> configurations for terasort to work better in MR2?
>
> Thanks in advance,
>
> John
>
>
> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Sam,
>> I think your cluster is too small for any meaningful conclusions to be
>> made.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Harsh,
>>
>> Thanks for your detailed response! Now, the efficiency of my Yarn cluster
>> improved a lot after increasing the reducer number(mapreduce.job.reduces)
>> in mapred-site.xml. But I still have some questions about the way of Yarn
>> to execute MRv1 job:
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>> yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?
>>
>> Thanks as always!
>>
>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>
>>> Hi Sam,
>>>
>>> > - How to know the container number? Why you say it will be 22
>>> containers due to a 22 GB memory?
>>>
>>> The MR2's default configuration requests 1 GB resource each for Map
>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>> runs the job, additionally. This is tunable using the properties
>>> Sandy's mentioned in his post.
>>>
>>> > - My machine has 32 GB memory, how many memory is proper to be
>>> assigned to containers?
>>>
>>> This is a general question. You may use the same process you took to
>>> decide optimal number of slots in MR1 to decide this here. Every
>>> container is a new JVM, and you're limited by the CPUs you have there
>>> (if not the memory). Either increase memory requests from jobs, to
>>> lower # of concurrent containers at a given time (runtime change), or
>>> lower NM's published memory resources to control the same (config
>>> change).
>>>
>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>> 'mapreduce.map.sort.spill.percent'
>>>
>>> Yes, all of these properties will still work. Old properties specific
>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>> name) will not apply anymore.
>>>
>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>>> > Hi Harsh,
>>> >
>>> > According to above suggestions, I removed the duplication of setting,
>>> and
>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
>>> > efficiency improved about 18%.  I have questions:
>>> >
>>> > - How to know the container number? Why you say it will be 22
>>> containers due
>>> > to a 22 GB memory?
>>> > - My machine has 32 GB memory, how many memory is proper to be
>>> assigned to
>>> > containers?
>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>> 'yarn', will
>>> > other parameters for mapred-site.xml still work in yarn framework? Like
>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>> >>
>>> >> Hey Sam,
>>> >>
>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>> >> resource. This could impact much, given the CPU is still the same for
>>> >> both test runs.
>>> >>
>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sa...@cloudera.com>
>>> >> wrote:
>>> >> > Hey Sam,
>>> >> >
>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>> what's
>>> >> > causing the difference.
>>> >> >
>>> >> > A couple observations:
>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>> there
>>> >> > twice
>>> >> > with two different values.
>>> >> >
>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>> default
>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>> getting
>>> >> > killed for running over resource limits?
>>> >> >
>>> >> > -Sandy
>>> >> >
>>> >> >
>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>>> from
>>> >> >> 33%
>>> >> >> to 35% as below.
>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>> >> >>
>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>> >> >> (A) core-site.xml
>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>> >> >>
>>> >> >> (B) hdfs-site.xml
>>> >> >>   <property>
>>> >> >>     <name>dfs.replication</name>
>>> >> >>     <value>1</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.name.dir</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.data.dir</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.block.size</name>
>>> >> >>     <value>134217728</value><!-- 128MB -->
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.namenode.handler.count</name>
>>> >> >>     <value>64</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.datanode.handler.count</name>
>>> >> >>     <value>10</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> (C) mapred-site.xml
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>> >> >>     <description>No description</description>
>>> >> >>     <final>true</final>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>> >> >>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>> >> >>     <description>No description</description>
>>> >> >>     <final>true</final>
>>> >> >>   </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>   <name>mapreduce.child.java.opts</name>
>>> >> >>   <value>-Xmx1000m</value>
>>> >> >> </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>     <name>mapreduce.framework.name</name>
>>> >> >>     <value>yarn</value>
>>> >> >>    </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>> >> >>     <value>8</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>> >> >>     <value>4</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>> >> >>     <value>true</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> (D) yarn-site.xml
>>> >> >>  <property>
>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>> >> >>     <value>node1:18025</value>
>>> >> >>     <description>host is the hostname of the resource manager and
>>> >> >>     port is the port on which the NodeManagers contact the Resource
>>> >> >> Manager.
>>> >> >>     </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <description>The address of the RM web
>>> application.</description>
>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>> >> >>     <value>node1:18088</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>> >> >>     <value>node1:18030</value>
>>> >> >>     <description>host is the hostname of the resourcemanager and
>>> port
>>> >> >> is
>>> >> >> the port
>>> >> >>     on which the Applications in the cluster talk to the Resource
>>> >> >> Manager.
>>> >> >>     </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.address</name>
>>> >> >>     <value>node1:18040</value>
>>> >> >>     <description>the host is the hostname of the ResourceManager
>>> and
>>> >> >> the
>>> >> >> port is the port on
>>> >> >>     which the clients can talk to the Resource Manager.
>>> </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>> >> >>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>> >> >>     <description>the local directories used by the
>>> >> >> nodemanager</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.address</name>
>>> >> >>     <value>0.0.0.0:18050</value>
>>> >> >>     <description>the nodemanagers bind to this port</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>> >> >>     <value>10240</value>
>>> >> >>     <description>the amount of memory on the NodeManager in
>>> >> >> GB</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>> >> >>     <description>directory on hdfs where the application logs are
>>> moved
>>> >> >> to
>>> >> >> </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>    <property>
>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>> >> >>     <description>the directories used by Nodemanagers as log
>>> >> >> directories</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>> >> >>     <value>mapreduce.shuffle</value>
>>> >> >>     <description>shuffle service that needs to be set for Map
>>> Reduce to
>>> >> >> run </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>> >> >>     <value>64</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>> >> >>     <value>24</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>> >> >>     <value>3</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>> >> >>     <value>22000</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>> >> >>     <value>2.1</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>> >> >>>
>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>> resource
>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum by
>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB NM
>>> >> >>> memory resource config) total containers. Do share your configs
>>> as at
>>> >> >>> this point none of us can tell what it is.
>>> >> >>>
>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>>> not
>>> >> >>> care about such things :)
>>> >> >>>
>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>> and
>>> >> >>> > Yarn.
>>> >> >>> > But
>>> >> >>> > they have many differences, and to be fair for comparison I did
>>> not
>>> >> >>> > tune
>>> >> >>> > their configurations at all.  So I got above test results. After
>>> >> >>> > analyzing
>>> >> >>> > the test result, no doubt, I will configure them and do
>>> comparison
>>> >> >>> > again.
>>> >> >>> >
>>> >> >>> > Do you have any idea on current test result? I think, to compare
>>> >> >>> > with
>>> >> >>> > MRv1,
>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>> >> >>> > phase(terasort test).
>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>> performance
>>> >> >>> > tunning?
>>> >> >>> >
>>> >> >>> > Thanks!
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <ma...@gmail.com>
>>> >> >>> >>
>>> >> >>> >> Why not to tune the configurations?
>>> >> >>> >> Both frameworks have many areas to tune:
>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>> >> >>> >>>
>>> >> >>> >>> Hi Experts,
>>> >> >>> >>>
>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>> >> >>> >>> future,
>>> >> >>> >>> and
>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>> >> >>> >>>
>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>> hardware:
>>> >> >>> >>> 2
>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the
>>> >> >>> >>> same
>>> >> >>> >>> env. To
>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>> >> >>> >>> configurations, but
>>> >> >>> >>> use the default configuration values.
>>> >> >>> >>>
>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1, if
>>> >> >>> >>> they
>>> >> >>> >>> all
>>> >> >>> >>> use default configuration, because Yarn is a better framework
>>> than
>>> >> >>> >>> MRv1.
>>> >> >>> >>> However, the test result shows some differences:
>>> >> >>> >>>
>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>> >> >>> >>>
>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>> >> >>> >>> - MRv1: 193 sec
>>> >> >>> >>> - Yarn: 69 sec
>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>> >> >>> >>>
>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>> >> >>> >>> - MRv1: 451 sec
>>> >> >>> >>> - Yarn: 1136 sec
>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>> >> >>> >>>
>>> >> >>> >>> After a fast analysis, I think the direct cause might be that
>>> Yarn
>>> >> >>> >>> is
>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>> >> >>> >>> phase.
>>> >> >>> >>>
>>> >> >>> >>> Here I have two questions:
>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>> >> >>> >>> materials?
>>> >> >>> >>>
>>> >> >>> >>> Thanks!
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> --
>>> >> >>> >> Marcos Ortiz Valmaseda
>>> >> >>> >> Product Manager at PDVSA
>>> >> >>> >> http://about.me/marcosortiz
>>> >> >>> >>
>>> >> >>> >
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Harsh J
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

The Terasort output for MR2 is as follows.

2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
Counters: 46
        File System Counters
                FILE: Number of bytes read=456102049355
                FILE: Number of bytes written=897246250517
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000851200
                HDFS: Number of bytes written=1000000000000
                HDFS: Number of read operations=32131
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=224
        Job Counters
                Killed map tasks=1
                Killed reduce tasks=20
                Launched map tasks=7601
                Launched reduce tasks=132
                Data-local map tasks=7591
                Rack-local map tasks=10
                Total time spent by all maps in occupied slots
(ms)=1696141311
                Total time spent by all reduces in occupied slots
(ms)=2664045096
        Map-Reduce Framework
                Map input records=10000000000
                Map output records=10000000000
                Map output bytes=1020000000000
                Map output materialized bytes=440486356802
                Input split bytes=851200
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000000
                Reduce shuffle bytes=440486356802
                Reduce input records=10000000000
                Reduce output records=10000000000
                Spilled Records=20000000000
                Shuffled Maps =851200
                Failed Shuffles=61
                Merged Map outputs=851200
                GC time elapsed (ms)=4215666
                CPU time spent (ms)=192433000
                Physical memory (bytes) snapshot=3349356380160
                Virtual memory (bytes) snapshot=9665208745984
                Total committed heap usage (bytes)=3636854259712
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=4
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1000000000000
        File Output Format Counters
                Bytes Written=1000000000000

Thanks,

John



On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <ji...@gmail.com>wrote:

> Hi,
>
> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
> cannot really believe it.
>
> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
> configurations are as follows.
>
>         mapreduce.map.java.opts = "-Xmx512m";
>         mapreduce.reduce.java.opts = "-Xmx1536m";
>         mapreduce.map.memory.mb = "768";
>         mapreduce.reduce.memory.mb = "2048";
>
>         yarn.scheduler.minimum-allocation-mb = "256";
>         yarn.scheduler.maximum-allocation-mb = "8192";
>         yarn.nodemanager.resource.memory-mb = "12288";
>         yarn.nodemanager.resource.cpu-vcores = "16";
>
>         mapreduce.reduce.shuffle.parallelcopies = "20";
>         mapreduce.task.io.sort.factor = "48";
>         mapreduce.task.io.sort.mb = "200";
>         mapreduce.map.speculative = "true";
>         mapreduce.reduce.speculative = "true";
>         mapreduce.framework.name = "yarn";
>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>         mapreduce.map.cpu.vcores = "1";
>         mapreduce.reduce.cpu.vcores = "2";
>
>         mapreduce.job.jvm.numtasks = "20";
>         mapreduce.map.output.compress = "true";
>         mapreduce.map.output.compress.codec =
> "org.apache.hadoop.io.compress.SnappyCodec";
>
>         yarn.resourcemanager.client.thread-count = "64";
>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>         yarn.resourcemanager.scheduler.class =
> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
> "org.apache.hadoop.mapred.ShuffleHandler";
>         yarn.nodemanager.vmem-pmem-ratio = "5";
>         yarn.nodemanager.container-executor.class =
> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>         yarn.nodemanager.container-manager.thread-count = "64";
>         yarn.nodemanager.localizer.client.thread-count = "20";
>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>
> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
> configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
> around 90 minutes in MR2.
>
> Does anyone know what is wrong here? Or do I need some special
> configurations for terasort to work better in MR2?
>
> Thanks in advance,
>
> John
>
>
> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Sam,
>> I think your cluster is too small for any meaningful conclusions to be
>> made.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Harsh,
>>
>> Thanks for your detailed response! Now, the efficiency of my Yarn cluster
>> improved a lot after increasing the reducer number(mapreduce.job.reduces)
>> in mapred-site.xml. But I still have some questions about the way of Yarn
>> to execute MRv1 job:
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>> yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?
>>
>> Thanks as always!
>>
>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>
>>> Hi Sam,
>>>
>>> > - How to know the container number? Why you say it will be 22
>>> containers due to a 22 GB memory?
>>>
>>> The MR2's default configuration requests 1 GB resource each for Map
>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>> runs the job, additionally. This is tunable using the properties
>>> Sandy's mentioned in his post.
>>>
>>> > - My machine has 32 GB memory, how many memory is proper to be
>>> assigned to containers?
>>>
>>> This is a general question. You may use the same process you took to
>>> decide optimal number of slots in MR1 to decide this here. Every
>>> container is a new JVM, and you're limited by the CPUs you have there
>>> (if not the memory). Either increase memory requests from jobs, to
>>> lower # of concurrent containers at a given time (runtime change), or
>>> lower NM's published memory resources to control the same (config
>>> change).
>>>
>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>> 'mapreduce.map.sort.spill.percent'
>>>
>>> Yes, all of these properties will still work. Old properties specific
>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>> name) will not apply anymore.
>>>
>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>>> > Hi Harsh,
>>> >
>>> > According to above suggestions, I removed the duplication of setting,
>>> and
>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
>>> > efficiency improved about 18%.  I have questions:
>>> >
>>> > - How to know the container number? Why you say it will be 22
>>> containers due
>>> > to a 22 GB memory?
>>> > - My machine has 32 GB memory, how many memory is proper to be
>>> assigned to
>>> > containers?
>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>> 'yarn', will
>>> > other parameters for mapred-site.xml still work in yarn framework? Like
>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>> >>
>>> >> Hey Sam,
>>> >>
>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>> >> resource. This could impact much, given the CPU is still the same for
>>> >> both test runs.
>>> >>
>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sa...@cloudera.com>
>>> >> wrote:
>>> >> > Hey Sam,
>>> >> >
>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>> what's
>>> >> > causing the difference.
>>> >> >
>>> >> > A couple observations:
>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>> there
>>> >> > twice
>>> >> > with two different values.
>>> >> >
>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>> default
>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>> getting
>>> >> > killed for running over resource limits?
>>> >> >
>>> >> > -Sandy
>>> >> >
>>> >> >
>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>>> from
>>> >> >> 33%
>>> >> >> to 35% as below.
>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>> >> >>
>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>> >> >> (A) core-site.xml
>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>> >> >>
>>> >> >> (B) hdfs-site.xml
>>> >> >>   <property>
>>> >> >>     <name>dfs.replication</name>
>>> >> >>     <value>1</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.name.dir</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.data.dir</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.block.size</name>
>>> >> >>     <value>134217728</value><!-- 128MB -->
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.namenode.handler.count</name>
>>> >> >>     <value>64</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.datanode.handler.count</name>
>>> >> >>     <value>10</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> (C) mapred-site.xml
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>> >> >>     <description>No description</description>
>>> >> >>     <final>true</final>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>> >> >>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>> >> >>     <description>No description</description>
>>> >> >>     <final>true</final>
>>> >> >>   </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>   <name>mapreduce.child.java.opts</name>
>>> >> >>   <value>-Xmx1000m</value>
>>> >> >> </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>     <name>mapreduce.framework.name</name>
>>> >> >>     <value>yarn</value>
>>> >> >>    </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>> >> >>     <value>8</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>> >> >>     <value>4</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>> >> >>     <value>true</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> (D) yarn-site.xml
>>> >> >>  <property>
>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>> >> >>     <value>node1:18025</value>
>>> >> >>     <description>host is the hostname of the resource manager and
>>> >> >>     port is the port on which the NodeManagers contact the Resource
>>> >> >> Manager.
>>> >> >>     </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <description>The address of the RM web
>>> application.</description>
>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>> >> >>     <value>node1:18088</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>> >> >>     <value>node1:18030</value>
>>> >> >>     <description>host is the hostname of the resourcemanager and
>>> port
>>> >> >> is
>>> >> >> the port
>>> >> >>     on which the Applications in the cluster talk to the Resource
>>> >> >> Manager.
>>> >> >>     </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.address</name>
>>> >> >>     <value>node1:18040</value>
>>> >> >>     <description>the host is the hostname of the ResourceManager
>>> and
>>> >> >> the
>>> >> >> port is the port on
>>> >> >>     which the clients can talk to the Resource Manager.
>>> </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>> >> >>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>> >> >>     <description>the local directories used by the
>>> >> >> nodemanager</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.address</name>
>>> >> >>     <value>0.0.0.0:18050</value>
>>> >> >>     <description>the nodemanagers bind to this port</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>> >> >>     <value>10240</value>
>>> >> >>     <description>the amount of memory on the NodeManager in
>>> >> >> GB</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>> >> >>     <description>directory on hdfs where the application logs are
>>> moved
>>> >> >> to
>>> >> >> </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>    <property>
>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>> >> >>     <description>the directories used by Nodemanagers as log
>>> >> >> directories</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>> >> >>     <value>mapreduce.shuffle</value>
>>> >> >>     <description>shuffle service that needs to be set for Map
>>> Reduce to
>>> >> >> run </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>> >> >>     <value>64</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>> >> >>     <value>24</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>> >> >>     <value>3</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>> >> >>     <value>22000</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>> >> >>     <value>2.1</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>> >> >>>
>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>> resource
>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum by
>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB NM
>>> >> >>> memory resource config) total containers. Do share your configs
>>> as at
>>> >> >>> this point none of us can tell what it is.
>>> >> >>>
>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>>> not
>>> >> >>> care about such things :)
>>> >> >>>
>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>> and
>>> >> >>> > Yarn.
>>> >> >>> > But
>>> >> >>> > they have many differences, and to be fair for comparison I did
>>> not
>>> >> >>> > tune
>>> >> >>> > their configurations at all.  So I got above test results. After
>>> >> >>> > analyzing
>>> >> >>> > the test result, no doubt, I will configure them and do
>>> comparison
>>> >> >>> > again.
>>> >> >>> >
>>> >> >>> > Do you have any idea on current test result? I think, to compare
>>> >> >>> > with
>>> >> >>> > MRv1,
>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>> >> >>> > phase(terasort test).
>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>> performance
>>> >> >>> > tunning?
>>> >> >>> >
>>> >> >>> > Thanks!
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <ma...@gmail.com>
>>> >> >>> >>
>>> >> >>> >> Why not to tune the configurations?
>>> >> >>> >> Both frameworks have many areas to tune:
>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>> >> >>> >>>
>>> >> >>> >>> Hi Experts,
>>> >> >>> >>>
>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>> >> >>> >>> future,
>>> >> >>> >>> and
>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>> >> >>> >>>
>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>> hardware:
>>> >> >>> >>> 2
>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the
>>> >> >>> >>> same
>>> >> >>> >>> env. To
>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>> >> >>> >>> configurations, but
>>> >> >>> >>> use the default configuration values.
>>> >> >>> >>>
>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1, if
>>> >> >>> >>> they
>>> >> >>> >>> all
>>> >> >>> >>> use default configuration, because Yarn is a better framework
>>> than
>>> >> >>> >>> MRv1.
>>> >> >>> >>> However, the test result shows some differences:
>>> >> >>> >>>
>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>> >> >>> >>>
>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>> >> >>> >>> - MRv1: 193 sec
>>> >> >>> >>> - Yarn: 69 sec
>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>> >> >>> >>>
>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>> >> >>> >>> - MRv1: 451 sec
>>> >> >>> >>> - Yarn: 1136 sec
>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>> >> >>> >>>
>>> >> >>> >>> After a fast analysis, I think the direct cause might be that
>>> Yarn
>>> >> >>> >>> is
>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>> >> >>> >>> phase.
>>> >> >>> >>>
>>> >> >>> >>> Here I have two questions:
>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>> >> >>> >>> materials?
>>> >> >>> >>>
>>> >> >>> >>> Thanks!
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> --
>>> >> >>> >> Marcos Ortiz Valmaseda
>>> >> >>> >> Product Manager at PDVSA
>>> >> >>> >> http://about.me/marcosortiz
>>> >> >>> >>
>>> >> >>> >
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Harsh J
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Posted by Jian Fang <ji...@gmail.com>.

The Terasort output for MR2 is as follows.

2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
Counters: 46
        File System Counters
                FILE: Number of bytes read=456102049355
                FILE: Number of bytes written=897246250517
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000851200
                HDFS: Number of bytes written=1000000000000
                HDFS: Number of read operations=32131
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=224
        Job Counters
                Killed map tasks=1
                Killed reduce tasks=20
                Launched map tasks=7601
                Launched reduce tasks=132
                Data-local map tasks=7591
                Rack-local map tasks=10
                Total time spent by all maps in occupied slots
(ms)=1696141311
                Total time spent by all reduces in occupied slots
(ms)=2664045096
        Map-Reduce Framework
                Map input records=10000000000
                Map output records=10000000000
                Map output bytes=1020000000000
                Map output materialized bytes=440486356802
                Input split bytes=851200
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000000
                Reduce shuffle bytes=440486356802
                Reduce input records=10000000000
                Reduce output records=10000000000
                Spilled Records=20000000000
                Shuffled Maps =851200
                Failed Shuffles=61
                Merged Map outputs=851200
                GC time elapsed (ms)=4215666
                CPU time spent (ms)=192433000
                Physical memory (bytes) snapshot=3349356380160
                Virtual memory (bytes) snapshot=9665208745984
                Total committed heap usage (bytes)=3636854259712
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=4
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1000000000000
        File Output Format Counters
                Bytes Written=1000000000000

Thanks,

John



On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <ji...@gmail.com>wrote:

> Hi,
>
> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
> turned out that the terasort for MR2 is 2 times slower than that in MR1. I
> cannot really believe it.
>
> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
> configurations are as follows.
>
>         mapreduce.map.java.opts = "-Xmx512m";
>         mapreduce.reduce.java.opts = "-Xmx1536m";
>         mapreduce.map.memory.mb = "768";
>         mapreduce.reduce.memory.mb = "2048";
>
>         yarn.scheduler.minimum-allocation-mb = "256";
>         yarn.scheduler.maximum-allocation-mb = "8192";
>         yarn.nodemanager.resource.memory-mb = "12288";
>         yarn.nodemanager.resource.cpu-vcores = "16";
>
>         mapreduce.reduce.shuffle.parallelcopies = "20";
>         mapreduce.task.io.sort.factor = "48";
>         mapreduce.task.io.sort.mb = "200";
>         mapreduce.map.speculative = "true";
>         mapreduce.reduce.speculative = "true";
>         mapreduce.framework.name = "yarn";
>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>         mapreduce.map.cpu.vcores = "1";
>         mapreduce.reduce.cpu.vcores = "2";
>
>         mapreduce.job.jvm.numtasks = "20";
>         mapreduce.map.output.compress = "true";
>         mapreduce.map.output.compress.codec =
> "org.apache.hadoop.io.compress.SnappyCodec";
>
>         yarn.resourcemanager.client.thread-count = "64";
>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>         yarn.resourcemanager.scheduler.class =
> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
> "org.apache.hadoop.mapred.ShuffleHandler";
>         yarn.nodemanager.vmem-pmem-ratio = "5";
>         yarn.nodemanager.container-executor.class =
> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>         yarn.nodemanager.container-manager.thread-count = "64";
>         yarn.nodemanager.localizer.client.thread-count = "20";
>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>
> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
> configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
> around 90 minutes in MR2.
>
> Does anyone know what is wrong here? Or do I need some special
> configurations for terasort to work better in MR2?
>
> Thanks in advance,
>
> John
>
>
> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <mi...@hotmail.com>wrote:
>
>> Sam,
>> I think your cluster is too small for any meaningful conclusions to be
>> made.
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Jun 18, 2013, at 3:58 AM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Harsh,
>>
>> Thanks for your detailed response! Now, the efficiency of my Yarn cluster
>> improved a lot after increasing the reducer number(mapreduce.job.reduces)
>> in mapred-site.xml. But I still have some questions about the way of Yarn
>> to execute MRv1 job:
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>> yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?
>>
>> Thanks as always!
>>
>> 2013/6/9 Harsh J <ha...@cloudera.com>
>>
>>> Hi Sam,
>>>
>>> > - How to know the container number? Why you say it will be 22
>>> containers due to a 22 GB memory?
>>>
>>> The MR2's default configuration requests 1 GB resource each for Map
>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>> runs the job, additionally. This is tunable using the properties
>>> Sandy's mentioned in his post.
>>>
>>> > - My machine has 32 GB memory, how many memory is proper to be
>>> assigned to containers?
>>>
>>> This is a general question. You may use the same process you took to
>>> decide optimal number of slots in MR1 to decide this here. Every
>>> container is a new JVM, and you're limited by the CPUs you have there
>>> (if not the memory). Either increase memory requests from jobs, to
>>> lower # of concurrent containers at a given time (runtime change), or
>>> lower NM's published memory resources to control the same (config
>>> change).
>>>
>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>> 'mapreduce.map.sort.spill.percent'
>>>
>>> Yes, all of these properties will still work. Old properties specific
>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>> name) will not apply anymore.
>>>
>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <sa...@gmail.com> wrote:
>>> > Hi Harsh,
>>> >
>>> > According to above suggestions, I removed the duplication of setting,
>>> and
>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
>>> > efficiency improved about 18%.  I have questions:
>>> >
>>> > - How to know the container number? Why you say it will be 22
>>> containers due
>>> > to a 22 GB memory?
>>> > - My machine has 32 GB memory, how many memory is proper to be
>>> assigned to
>>> > containers?
>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>> 'yarn', will
>>> > other parameters for mapred-site.xml still work in yarn framework? Like
>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >
>>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>>> >>
>>> >> Hey Sam,
>>> >>
>>> >> Did you get a chance to retry with Sandy's suggestions? The config
>>> >> appears to be asking NMs to use roughly 22 total containers (as
>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>> >> resource. This could impact much, given the CPU is still the same for
>>> >> both test runs.
>>> >>
>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sa...@cloudera.com>
>>> >> wrote:
>>> >> > Hey Sam,
>>> >> >
>>> >> > Thanks for sharing your results.  I'm definitely curious about
>>> what's
>>> >> > causing the difference.
>>> >> >
>>> >> > A couple observations:
>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in
>>> there
>>> >> > twice
>>> >> > with two different values.
>>> >> >
>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the
>>> default
>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>>> getting
>>> >> > killed for running over resource limits?
>>> >> >
>>> >> > -Sandy
>>> >> >
>>> >> >
>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <sa...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>>> from
>>> >> >> 33%
>>> >> >> to 35% as below.
>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>> >> >>
>>> >> >> Any way, below are my configurations for your reference. Thanks!
>>> >> >> (A) core-site.xml
>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>> >> >>
>>> >> >> (B) hdfs-site.xml
>>> >> >>   <property>
>>> >> >>     <name>dfs.replication</name>
>>> >> >>     <value>1</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.name.dir</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.data.dir</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.block.size</name>
>>> >> >>     <value>134217728</value><!-- 128MB -->
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.namenode.handler.count</name>
>>> >> >>     <value>64</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>dfs.datanode.handler.count</name>
>>> >> >>     <value>10</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> (C) mapred-site.xml
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>> >> >>     <description>No description</description>
>>> >> >>     <final>true</final>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>> >> >>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>> >> >>     <description>No description</description>
>>> >> >>     <final>true</final>
>>> >> >>   </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>   <name>mapreduce.child.java.opts</name>
>>> >> >>   <value>-Xmx1000m</value>
>>> >> >> </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>     <name>mapreduce.framework.name</name>
>>> >> >>     <value>yarn</value>
>>> >> >>    </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>> >> >>     <value>8</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>> >> >>     <value>4</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>> >> >>     <value>true</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> (D) yarn-site.xml
>>> >> >>  <property>
>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>> >> >>     <value>node1:18025</value>
>>> >> >>     <description>host is the hostname of the resource manager and
>>> >> >>     port is the port on which the NodeManagers contact the Resource
>>> >> >> Manager.
>>> >> >>     </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <description>The address of the RM web
>>> application.</description>
>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>> >> >>     <value>node1:18088</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>> >> >>     <value>node1:18030</value>
>>> >> >>     <description>host is the hostname of the resourcemanager and
>>> port
>>> >> >> is
>>> >> >> the port
>>> >> >>     on which the Applications in the cluster talk to the Resource
>>> >> >> Manager.
>>> >> >>     </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.address</name>
>>> >> >>     <value>node1:18040</value>
>>> >> >>     <description>the host is the hostname of the ResourceManager
>>> and
>>> >> >> the
>>> >> >> port is the port on
>>> >> >>     which the clients can talk to the Resource Manager.
>>> </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>> >> >>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>> >> >>     <description>the local directories used by the
>>> >> >> nodemanager</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.address</name>
>>> >> >>     <value>0.0.0.0:18050</value>
>>> >> >>     <description>the nodemanagers bind to this port</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>> >> >>     <value>10240</value>
>>> >> >>     <description>the amount of memory on the NodeManager in
>>> >> >> GB</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>> >> >>
>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>> >> >>     <description>directory on hdfs where the application logs are
>>> moved
>>> >> >> to
>>> >> >> </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>    <property>
>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>> >> >>     <description>the directories used by Nodemanagers as log
>>> >> >> directories</description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>> >> >>     <value>mapreduce.shuffle</value>
>>> >> >>     <description>shuffle service that needs to be set for Map
>>> Reduce to
>>> >> >> run </description>
>>> >> >>   </property>
>>> >> >>
>>> >> >>   <property>
>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>> >> >>     <value>64</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>> >> >>     <value>24</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >> <property>
>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>> >> >>     <value>3</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>> >> >>     <value>22000</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>  <property>
>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>> >> >>     <value>2.1</value>
>>> >> >>   </property>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>>> >> >>>
>>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>>> resource
>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum by
>>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB NM
>>> >> >>> memory resource config) total containers. Do share your configs
>>> as at
>>> >> >>> this point none of us can tell what it is.
>>> >> >>>
>>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>>> not
>>> >> >>> care about such things :)
>>> >> >>>
>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <sa...@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > At the begining, I just want to do a fast comparision of MRv1
>>> and
>>> >> >>> > Yarn.
>>> >> >>> > But
>>> >> >>> > they have many differences, and to be fair for comparison I did
>>> not
>>> >> >>> > tune
>>> >> >>> > their configurations at all.  So I got above test results. After
>>> >> >>> > analyzing
>>> >> >>> > the test result, no doubt, I will configure them and do
>>> comparison
>>> >> >>> > again.
>>> >> >>> >
>>> >> >>> > Do you have any idea on current test result? I think, to compare
>>> >> >>> > with
>>> >> >>> > MRv1,
>>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>>> >> >>> > phase(terasort test).
>>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>>> performance
>>> >> >>> > tunning?
>>> >> >>> >
>>> >> >>> > Thanks!
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <ma...@gmail.com>
>>> >> >>> >>
>>> >> >>> >> Why not to tune the configurations?
>>> >> >>> >> Both frameworks have many areas to tune:
>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> 2013/6/6 sam liu <sa...@gmail.com>
>>> >> >>> >>>
>>> >> >>> >>> Hi Experts,
>>> >> >>> >>>
>>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>>> >> >>> >>> future,
>>> >> >>> >>> and
>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>> >> >>> >>>
>>> >> >>> >>> My env is three nodes cluster, and each node has similar
>>> hardware:
>>> >> >>> >>> 2
>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the
>>> >> >>> >>> same
>>> >> >>> >>> env. To
>>> >> >>> >>> be fair, I did not make any performance tuning on their
>>> >> >>> >>> configurations, but
>>> >> >>> >>> use the default configuration values.
>>> >> >>> >>>
>>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1, if
>>> >> >>> >>> they
>>> >> >>> >>> all
>>> >> >>> >>> use default configuration, because Yarn is a better framework
>>> than
>>> >> >>> >>> MRv1.
>>> >> >>> >>> However, the test result shows some differences:
>>> >> >>> >>>
>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>> >> >>> >>>
>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>> >> >>> >>> - MRv1: 193 sec
>>> >> >>> >>> - Yarn: 69 sec
>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>> >> >>> >>>
>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>> >> >>> >>> - MRv1: 451 sec
>>> >> >>> >>> - Yarn: 1136 sec
>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>> >> >>> >>>
>>> >> >>> >>> After a fast analysis, I think the direct cause might be that
>>> Yarn
>>> >> >>> >>> is
>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>>> >> >>> >>> phase.
>>> >> >>> >>>
>>> >> >>> >>> Here I have two questions:
>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>>> >> >>> >>> materials?
>>> >> >>> >>>
>>> >> >>> >>> Thanks!
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >>
>>> >> >>> >> --
>>> >> >>> >> Marcos Ortiz Valmaseda
>>> >> >>> >> Product Manager at PDVSA
>>> >> >>> >> http://about.me/marcosortiz
>>> >> >>> >>
>>> >> >>> >
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Harsh J
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Harsh J
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>