You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Gaurav Dasgupta <gd...@gmail.com> on 2012/08/28 12:32:10 UTC

MRBench Maps strange behaviour

Hi All,

I executed the "MRBench" program from "hadoop-test.jar" in my 12 node CDH3
cluster. After executing, I had some strange observations regarding the
number of Maps it ran.

First I ran the command:
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
200 -reduces 200 -inputLines 1024 -inputType random
And I could see that the actual number of Maps it ran was 201 (for all the
3 runs) instead of 200 (Though the end report displays the launched to be
200). Here is the console report:


12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035

12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28

12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters

12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200

12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209

12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0

12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0

12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137

*12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201*

12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64

12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882



Again, I ran the MRBench for just 10 Maps and 10 Reduces:

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10



This time the actual number of Maps were only 2 and again the end report
displays Maps Lauched to be 10. The console output:



12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
*12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
*12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
(bytes)=6218842112
12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=3348828160
12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=22955810816
12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
*DataLines Maps Reduces AvgTime (milliseconds)
1                20     20           17451
*

Can some one please help me understand this behaviour of Hadoop in this
case. My main purpose of running a MRBench is to calculate the Average time
for certain amount of Maps, Reduces, InputLines etc. If the number of Maps
is not what I submitted, then how can I judge my benchmark results?



Thanks,

Gaurav Dasgupta

Re: MRBench Maps strange behaviour

Posted by Bejoy KS <be...@gmail.com>.
Hi Gaurav

You can get the information on the num of map tasks in the job from the JT web UI itself.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Gaurav Dasgupta <gd...@gmail.com>
Date: Wed, 29 Aug 2012 13:14:11 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>


RE: MRBench Maps strange behaviour

Posted by Leo Leung <ll...@ddn.com>.
mrbench "actual lunched map task" depends on the number of inputLines.

So in your first case, you did specify more input that maps, hence all maps are lunched.

The default inputLines is 1,  which is (cough cough)  quite oblivious to the number of maps you specify.
(That was your second case)


From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Wednesday, August 29, 2012 1:45 AM
To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following parameter:

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers. But he is getting some different results which are visible in the counters and we can use all our map and input-split logics to justify the counter outputs.

The question arises here -- how can we use MRBench -- what it provides you ? How can we control it to run with different parameters to do some benchmarking ? Can someone explain how to use MRBench and what it exactly does.

Regards,
Praveenesh
On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <yh...@gmail.com>> wrote:
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com>> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1                20     20           17451
>> >
>> > Can some one please help me understand this behaviour of Hadoop in this
>> > case. My main purpose of running a MRBench is to calculate the Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the number of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark results?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>


RE: MRBench Maps strange behaviour

Posted by Leo Leung <ll...@ddn.com>.
mrbench "actual lunched map task" depends on the number of inputLines.

So in your first case, you did specify more input that maps, hence all maps are lunched.

The default inputLines is 1,  which is (cough cough)  quite oblivious to the number of maps you specify.
(That was your second case)


From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Wednesday, August 29, 2012 1:45 AM
To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following parameter:

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers. But he is getting some different results which are visible in the counters and we can use all our map and input-split logics to justify the counter outputs.

The question arises here -- how can we use MRBench -- what it provides you ? How can we control it to run with different parameters to do some benchmarking ? Can someone explain how to use MRBench and what it exactly does.

Regards,
Praveenesh
On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <yh...@gmail.com>> wrote:
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com>> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1                20     20           17451
>> >
>> > Can some one please help me understand this behaviour of Hadoop in this
>> > case. My main purpose of running a MRBench is to calculate the Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the number of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark results?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>


RE: MRBench Maps strange behaviour

Posted by Leo Leung <ll...@ddn.com>.
mrbench "actual lunched map task" depends on the number of inputLines.

So in your first case, you did specify more input that maps, hence all maps are lunched.

The default inputLines is 1,  which is (cough cough)  quite oblivious to the number of maps you specify.
(That was your second case)


From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Wednesday, August 29, 2012 1:45 AM
To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following parameter:

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers. But he is getting some different results which are visible in the counters and we can use all our map and input-split logics to justify the counter outputs.

The question arises here -- how can we use MRBench -- what it provides you ? How can we control it to run with different parameters to do some benchmarking ? Can someone explain how to use MRBench and what it exactly does.

Regards,
Praveenesh
On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <yh...@gmail.com>> wrote:
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com>> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1                20     20           17451
>> >
>> > Can some one please help me understand this behaviour of Hadoop in this
>> > case. My main purpose of running a MRBench is to calculate the Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the number of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark results?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>


RE: MRBench Maps strange behaviour

Posted by Leo Leung <ll...@ddn.com>.
mrbench "actual lunched map task" depends on the number of inputLines.

So in your first case, you did specify more input that maps, hence all maps are lunched.

The default inputLines is 1,  which is (cough cough)  quite oblivious to the number of maps you specify.
(That was your second case)


From: praveenesh kumar [mailto:praveenesh@gmail.com]
Sent: Wednesday, August 29, 2012 1:45 AM
To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following parameter:

hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers. But he is getting some different results which are visible in the counters and we can use all our map and input-split logics to justify the counter outputs.

The question arises here -- how can we use MRBench -- what it provides you ? How can we control it to run with different parameters to do some benchmarking ? Can someone explain how to use MRBench and what it exactly does.

Regards,
Praveenesh
On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <yh...@gmail.com>> wrote:
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com>> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1                20     20           17451
>> >
>> > Can some one please help me understand this behaviour of Hadoop in this
>> > case. My main purpose of running a MRBench is to calculate the Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the number of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark results?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>


Re: MRBench Maps strange behaviour

Posted by praveenesh kumar <pr...@gmail.com>.
Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following
parameter:
*
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
*

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers.
But he is getting some different results which are visible in the counters
and we can use all our map and input-split logics to justify the counter
outputs.

The question arises here -- how can we use MRBench -- what it provides you
? How can we control it to run with different parameters to do some
benchmarking ? Can someone explain how to use MRBench and what it exactly
does.

Regards,
Praveenesh

On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Assume you are asking about what is the exact number of maps launched.
> If yes, then the output of the MRBench run is printing the counter
> "Launched map tasks". That is the exact value of maps launched.
>
> Thanks
> Hemanth
>
> On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi Hemanth,
> >
> > Thanks for the reply.
> > Can you tell me how can I calculate or ensure from the counters what
> should
> > be the exact number of Maps?
> > Thanks,
> > Gaurav Dasgupta
> > On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> The number of maps specified to any map reduce program (including
> >> those part of MRBench) is generally only a hint, and the actual number
> >> of maps will be influenced in typical cases by the amount of data
> >> being processed. You can take a look at this wiki link to understand
> >> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
> >>
> >> In the examples below, since the data you've generated is different,
> >> the number of mappers are different. To be able to judge your
> >> benchmark results, you'd need to benchmark against the same data (or
> >> at least same type of type - i.e. size and type).
> >>
> >> The number of maps printed at the end is straight from the input
> >> specified and doesn't reflect what the job actually ran with. The
> >> information from the counters is the right one.
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> >> wrote:
> >> > Hi All,
> >> >
> >> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> >> > CDH3
> >> > cluster. After executing, I had some strange observations regarding
> the
> >> > number of Maps it ran.
> >> >
> >> > First I ran the command:
> >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3
> -maps
> >> > 200
> >> > -reduces 200 -inputLines 1024 -inputType random
> >> > And I could see that the actual number of Maps it ran was 201 (for all
> >> > the 3
> >> > runs) instead of 200 (Though the end report displays the launched to
> be
> >> > 200). Here is the console report:
> >> >
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> >> > job_201208230144_0035
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> >> > reduces
> >> > waiting after reserving slots (ms)=0
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> >> > maps
> >> > waiting after reserving slots (ms)=0
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:
> >> > SLOTS_MILLIS_REDUCES=1756882
> >> >
> >> >
> >> >
> >> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >> >
> >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> >> > -reduces 10
> >> >
> >> >
> >> >
> >> > This time the actual number of Maps were only 2 and again the end
> report
> >> > displays Maps Lauched to be 10. The console output:
> >> >
> >> >
> >> >
> >> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> >> > job_201208230144_0040
> >> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> >> > reduces
> >> > waiting after reserving slots (ms)=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> >> > maps
> >> > waiting after reserving slots (ms)=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:
> SLOTS_MILLIS_REDUCES=163257
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=1072596
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap
> usage
> >> > (bytes)=6218842112
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> >> > snapshot=3348828160
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> >> > snapshot=22955810816
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> >> > DataLines Maps Reduces AvgTime (milliseconds)
> >> > 1                20     20           17451
> >> >
> >> > Can some one please help me understand this behaviour of Hadoop in
> this
> >> > case. My main purpose of running a MRBench is to calculate the Average
> >> > time
> >> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> >> > Maps
> >> > is not what I submitted, then how can I judge my benchmark results?
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Gaurav Dasgupta
> >
> >
>

Re: MRBench Maps strange behaviour

Posted by praveenesh kumar <pr...@gmail.com>.
Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following
parameter:
*
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
*

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers.
But he is getting some different results which are visible in the counters
and we can use all our map and input-split logics to justify the counter
outputs.

The question arises here -- how can we use MRBench -- what it provides you
? How can we control it to run with different parameters to do some
benchmarking ? Can someone explain how to use MRBench and what it exactly
does.

Regards,
Praveenesh

On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Assume you are asking about what is the exact number of maps launched.
> If yes, then the output of the MRBench run is printing the counter
> "Launched map tasks". That is the exact value of maps launched.
>
> Thanks
> Hemanth
>
> On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi Hemanth,
> >
> > Thanks for the reply.
> > Can you tell me how can I calculate or ensure from the counters what
> should
> > be the exact number of Maps?
> > Thanks,
> > Gaurav Dasgupta
> > On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> The number of maps specified to any map reduce program (including
> >> those part of MRBench) is generally only a hint, and the actual number
> >> of maps will be influenced in typical cases by the amount of data
> >> being processed. You can take a look at this wiki link to understand
> >> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
> >>
> >> In the examples below, since the data you've generated is different,
> >> the number of mappers are different. To be able to judge your
> >> benchmark results, you'd need to benchmark against the same data (or
> >> at least same type of type - i.e. size and type).
> >>
> >> The number of maps printed at the end is straight from the input
> >> specified and doesn't reflect what the job actually ran with. The
> >> information from the counters is the right one.
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> >> wrote:
> >> > Hi All,
> >> >
> >> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> >> > CDH3
> >> > cluster. After executing, I had some strange observations regarding
> the
> >> > number of Maps it ran.
> >> >
> >> > First I ran the command:
> >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3
> -maps
> >> > 200
> >> > -reduces 200 -inputLines 1024 -inputType random
> >> > And I could see that the actual number of Maps it ran was 201 (for all
> >> > the 3
> >> > runs) instead of 200 (Though the end report displays the launched to
> be
> >> > 200). Here is the console report:
> >> >
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> >> > job_201208230144_0035
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> >> > reduces
> >> > waiting after reserving slots (ms)=0
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> >> > maps
> >> > waiting after reserving slots (ms)=0
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:
> >> > SLOTS_MILLIS_REDUCES=1756882
> >> >
> >> >
> >> >
> >> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >> >
> >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> >> > -reduces 10
> >> >
> >> >
> >> >
> >> > This time the actual number of Maps were only 2 and again the end
> report
> >> > displays Maps Lauched to be 10. The console output:
> >> >
> >> >
> >> >
> >> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> >> > job_201208230144_0040
> >> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> >> > reduces
> >> > waiting after reserving slots (ms)=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> >> > maps
> >> > waiting after reserving slots (ms)=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:
> SLOTS_MILLIS_REDUCES=163257
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=1072596
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap
> usage
> >> > (bytes)=6218842112
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> >> > snapshot=3348828160
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> >> > snapshot=22955810816
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> >> > DataLines Maps Reduces AvgTime (milliseconds)
> >> > 1                20     20           17451
> >> >
> >> > Can some one please help me understand this behaviour of Hadoop in
> this
> >> > case. My main purpose of running a MRBench is to calculate the Average
> >> > time
> >> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> >> > Maps
> >> > is not what I submitted, then how can I judge my benchmark results?
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Gaurav Dasgupta
> >
> >
>

Re: MRBench Maps strange behaviour

Posted by praveenesh kumar <pr...@gmail.com>.
Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following
parameter:
*
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
*

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers.
But he is getting some different results which are visible in the counters
and we can use all our map and input-split logics to justify the counter
outputs.

The question arises here -- how can we use MRBench -- what it provides you
? How can we control it to run with different parameters to do some
benchmarking ? Can someone explain how to use MRBench and what it exactly
does.

Regards,
Praveenesh

On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Assume you are asking about what is the exact number of maps launched.
> If yes, then the output of the MRBench run is printing the counter
> "Launched map tasks". That is the exact value of maps launched.
>
> Thanks
> Hemanth
>
> On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi Hemanth,
> >
> > Thanks for the reply.
> > Can you tell me how can I calculate or ensure from the counters what
> should
> > be the exact number of Maps?
> > Thanks,
> > Gaurav Dasgupta
> > On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> The number of maps specified to any map reduce program (including
> >> those part of MRBench) is generally only a hint, and the actual number
> >> of maps will be influenced in typical cases by the amount of data
> >> being processed. You can take a look at this wiki link to understand
> >> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
> >>
> >> In the examples below, since the data you've generated is different,
> >> the number of mappers are different. To be able to judge your
> >> benchmark results, you'd need to benchmark against the same data (or
> >> at least same type of type - i.e. size and type).
> >>
> >> The number of maps printed at the end is straight from the input
> >> specified and doesn't reflect what the job actually ran with. The
> >> information from the counters is the right one.
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> >> wrote:
> >> > Hi All,
> >> >
> >> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> >> > CDH3
> >> > cluster. After executing, I had some strange observations regarding
> the
> >> > number of Maps it ran.
> >> >
> >> > First I ran the command:
> >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3
> -maps
> >> > 200
> >> > -reduces 200 -inputLines 1024 -inputType random
> >> > And I could see that the actual number of Maps it ran was 201 (for all
> >> > the 3
> >> > runs) instead of 200 (Though the end report displays the launched to
> be
> >> > 200). Here is the console report:
> >> >
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> >> > job_201208230144_0035
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> >> > reduces
> >> > waiting after reserving slots (ms)=0
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> >> > maps
> >> > waiting after reserving slots (ms)=0
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:
> >> > SLOTS_MILLIS_REDUCES=1756882
> >> >
> >> >
> >> >
> >> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >> >
> >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> >> > -reduces 10
> >> >
> >> >
> >> >
> >> > This time the actual number of Maps were only 2 and again the end
> report
> >> > displays Maps Lauched to be 10. The console output:
> >> >
> >> >
> >> >
> >> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> >> > job_201208230144_0040
> >> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> >> > reduces
> >> > waiting after reserving slots (ms)=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> >> > maps
> >> > waiting after reserving slots (ms)=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:
> SLOTS_MILLIS_REDUCES=163257
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=1072596
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap
> usage
> >> > (bytes)=6218842112
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> >> > snapshot=3348828160
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> >> > snapshot=22955810816
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> >> > DataLines Maps Reduces AvgTime (milliseconds)
> >> > 1                20     20           17451
> >> >
> >> > Can some one please help me understand this behaviour of Hadoop in
> this
> >> > case. My main purpose of running a MRBench is to calculate the Average
> >> > time
> >> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> >> > Maps
> >> > is not what I submitted, then how can I judge my benchmark results?
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Gaurav Dasgupta
> >
> >
>

Re: MRBench Maps strange behaviour

Posted by praveenesh kumar <pr...@gmail.com>.
Then the question arises how MRBench is using the parameters :
According to the mail he send... he is running MRBench with the following
parameter:
*
hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
*

I guess he is assuming the MRbench to launch 10 mappers and 10 reducers.
But he is getting some different results which are visible in the counters
and we can use all our map and input-split logics to justify the counter
outputs.

The question arises here -- how can we use MRBench -- what it provides you
? How can we control it to run with different parameters to do some
benchmarking ? Can someone explain how to use MRBench and what it exactly
does.

Regards,
Praveenesh

On Wed, Aug 29, 2012 at 3:31 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Assume you are asking about what is the exact number of maps launched.
> If yes, then the output of the MRBench run is printing the counter
> "Launched map tasks". That is the exact value of maps launched.
>
> Thanks
> Hemanth
>
> On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi Hemanth,
> >
> > Thanks for the reply.
> > Can you tell me how can I calculate or ensure from the counters what
> should
> > be the exact number of Maps?
> > Thanks,
> > Gaurav Dasgupta
> > On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> The number of maps specified to any map reduce program (including
> >> those part of MRBench) is generally only a hint, and the actual number
> >> of maps will be influenced in typical cases by the amount of data
> >> being processed. You can take a look at this wiki link to understand
> >> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
> >>
> >> In the examples below, since the data you've generated is different,
> >> the number of mappers are different. To be able to judge your
> >> benchmark results, you'd need to benchmark against the same data (or
> >> at least same type of type - i.e. size and type).
> >>
> >> The number of maps printed at the end is straight from the input
> >> specified and doesn't reflect what the job actually ran with. The
> >> information from the counters is the right one.
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> >> wrote:
> >> > Hi All,
> >> >
> >> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> >> > CDH3
> >> > cluster. After executing, I had some strange observations regarding
> the
> >> > number of Maps it ran.
> >> >
> >> > First I ran the command:
> >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3
> -maps
> >> > 200
> >> > -reduces 200 -inputLines 1024 -inputType random
> >> > And I could see that the actual number of Maps it ran was 201 (for all
> >> > the 3
> >> > runs) instead of 200 (Though the end report displays the launched to
> be
> >> > 200). Here is the console report:
> >> >
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> >> > job_201208230144_0035
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> >> > reduces
> >> > waiting after reserving slots (ms)=0
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> >> > maps
> >> > waiting after reserving slots (ms)=0
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >> >
> >> > 12/08/28 04:34:35 INFO mapred.JobClient:
> >> > SLOTS_MILLIS_REDUCES=1756882
> >> >
> >> >
> >> >
> >> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >> >
> >> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> >> > -reduces 10
> >> >
> >> >
> >> >
> >> > This time the actual number of Maps were only 2 and again the end
> report
> >> > displays Maps Lauched to be 10. The console output:
> >> >
> >> >
> >> >
> >> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> >> > job_201208230144_0040
> >> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> >> > reduces
> >> > waiting after reserving slots (ms)=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> >> > maps
> >> > waiting after reserving slots (ms)=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:
> SLOTS_MILLIS_REDUCES=163257
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=1072596
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap
> usage
> >> > (bytes)=6218842112
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> >> > snapshot=3348828160
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> >> > snapshot=22955810816
> >> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> >> > DataLines Maps Reduces AvgTime (milliseconds)
> >> > 1                20     20           17451
> >> >
> >> > Can some one please help me understand this behaviour of Hadoop in
> this
> >> > case. My main purpose of running a MRBench is to calculate the Average
> >> > time
> >> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> >> > Maps
> >> > is not what I submitted, then how can I judge my benchmark results?
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Gaurav Dasgupta
> >
> >
>

Re: MRBench Maps strange behaviour

Posted by Hemanth Yamijala <yh...@gmail.com>.
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1                20     20           17451
>> >
>> > Can some one please help me understand this behaviour of Hadoop in this
>> > case. My main purpose of running a MRBench is to calculate the Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the number of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark results?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>

Re: MRBench Maps strange behaviour

Posted by Bejoy KS <be...@gmail.com>.
Hi Gaurav

You can get the information on the num of map tasks in the job from the JT web UI itself.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Gaurav Dasgupta <gd...@gmail.com>
Date: Wed, 29 Aug 2012 13:14:11 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>


Re: MRBench Maps strange behaviour

Posted by Hemanth Yamijala <yh...@gmail.com>.
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1                20     20           17451
>> >
>> > Can some one please help me understand this behaviour of Hadoop in this
>> > case. My main purpose of running a MRBench is to calculate the Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the number of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark results?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>

Re: MRBench Maps strange behaviour

Posted by Bejoy KS <be...@gmail.com>.
Hi Gaurav

You can get the information on the num of map tasks in the job from the JT web UI itself.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Gaurav Dasgupta <gd...@gmail.com>
Date: Wed, 29 Aug 2012 13:14:11 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>


Re: MRBench Maps strange behaviour

Posted by Hemanth Yamijala <yh...@gmail.com>.
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1                20     20           17451
>> >
>> > Can some one please help me understand this behaviour of Hadoop in this
>> > case. My main purpose of running a MRBench is to calculate the Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the number of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark results?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>

Re: MRBench Maps strange behaviour

Posted by Hemanth Yamijala <yh...@gmail.com>.
Assume you are asking about what is the exact number of maps launched.
If yes, then the output of the MRBench run is printing the counter
"Launched map tasks". That is the exact value of maps launched.

Thanks
Hemanth

On Wed, Aug 29, 2012 at 1:14 PM, Gaurav Dasgupta <gd...@gmail.com> wrote:
> Hi Hemanth,
>
> Thanks for the reply.
> Can you tell me how can I calculate or ensure from the counters what should
> be the exact number of Maps?
> Thanks,
> Gaurav Dasgupta
> On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> The number of maps specified to any map reduce program (including
>> those part of MRBench) is generally only a hint, and the actual number
>> of maps will be influenced in typical cases by the amount of data
>> being processed. You can take a look at this wiki link to understand
>> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> In the examples below, since the data you've generated is different,
>> the number of mappers are different. To be able to judge your
>> benchmark results, you'd need to benchmark against the same data (or
>> at least same type of type - i.e. size and type).
>>
>> The number of maps printed at the end is straight from the input
>> specified and doesn't reflect what the job actually ran with. The
>> information from the counters is the right one.
>>
>> Thanks
>> Hemanth
>>
>> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
>> > CDH3
>> > cluster. After executing, I had some strange observations regarding the
>> > number of Maps it ran.
>> >
>> > First I ran the command:
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
>> > 200
>> > -reduces 200 -inputLines 1024 -inputType random
>> > And I could see that the actual number of Maps it ran was 201 (for all
>> > the 3
>> > runs) instead of 200 (Though the end report displays the launched to be
>> > 200). Here is the console report:
>> >
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0035
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>> >
>> > 12/08/28 04:34:35 INFO mapred.JobClient:
>> > SLOTS_MILLIS_REDUCES=1756882
>> >
>> >
>> >
>> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>> >
>> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
>> > -reduces 10
>> >
>> >
>> >
>> > This time the actual number of Maps were only 2 and again the end report
>> > displays Maps Lauched to be 10. The console output:
>> >
>> >
>> >
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
>> > job_201208230144_0040
>> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > reduces
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
>> > maps
>> > waiting after reserving slots (ms)=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
>> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=6218842112
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=3348828160
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=22955810816
>> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
>> > DataLines Maps Reduces AvgTime (milliseconds)
>> > 1                20     20           17451
>> >
>> > Can some one please help me understand this behaviour of Hadoop in this
>> > case. My main purpose of running a MRBench is to calculate the Average
>> > time
>> > for certain amount of Maps, Reduces, InputLines etc. If the number of
>> > Maps
>> > is not what I submitted, then how can I judge my benchmark results?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Gaurav Dasgupta
>
>

Re: MRBench Maps strange behaviour

Posted by Bejoy KS <be...@gmail.com>.
Hi Gaurav

You can get the information on the num of map tasks in the job from the JT web UI itself.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Gaurav Dasgupta <gd...@gmail.com>
Date: Wed, 29 Aug 2012 13:14:11 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: MRBench Maps strange behaviour

Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>


Re: MRBench Maps strange behaviour

Posted by Gaurav Dasgupta <gd...@gmail.com>.
Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>

Re: MRBench Maps strange behaviour

Posted by Gaurav Dasgupta <gd...@gmail.com>.
Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>

Re: MRBench Maps strange behaviour

Posted by Gaurav Dasgupta <gd...@gmail.com>.
Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>

Re: MRBench Maps strange behaviour

Posted by Gaurav Dasgupta <gd...@gmail.com>.
Hi Hemanth,

Thanks for the reply.
Can you tell me how can I calculate or ensure from the counters what should
be the exact number of Maps?
Thanks,
Gaurav Dasgupta
On Wed, Aug 29, 2012 at 11:26 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> The number of maps specified to any map reduce program (including
> those part of MRBench) is generally only a hint, and the actual number
> of maps will be influenced in typical cases by the amount of data
> being processed. You can take a look at this wiki link to understand
> more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> In the examples below, since the data you've generated is different,
> the number of mappers are different. To be able to judge your
> benchmark results, you'd need to benchmark against the same data (or
> at least same type of type - i.e. size and type).
>
> The number of maps printed at the end is straight from the input
> specified and doesn't reflect what the job actually ran with. The
> information from the counters is the right one.
>
> Thanks
> Hemanth
>
> On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com>
> wrote:
> > Hi All,
> >
> > I executed the "MRBench" program from "hadoop-test.jar" in my 12 node
> CDH3
> > cluster. After executing, I had some strange observations regarding the
> > number of Maps it ran.
> >
> > First I ran the command:
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps
> 200
> > -reduces 200 -inputLines 1024 -inputType random
> > And I could see that the actual number of Maps it ran was 201 (for all
> the 3
> > runs) instead of 200 (Though the end report displays the launched to be
> > 200). Here is the console report:
> >
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0035
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
> >
> > 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
> >
> >
> >
> > Again, I ran the MRBench for just 10 Maps and 10 Reduces:
> >
> > hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10
> -reduces 10
> >
> >
> >
> > This time the actual number of Maps were only 2 and again the end report
> > displays Maps Lauched to be 10. The console output:
> >
> >
> >
> > 12/08/28 05:05:35 INFO mapred.JobClient: Job complete:
> job_201208230144_0040
> > 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all
> reduces
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> > 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> > 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> > 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> > 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> > 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> > (bytes)=6218842112
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> > snapshot=3348828160
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> > snapshot=22955810816
> > 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> > DataLines Maps Reduces AvgTime (milliseconds)
> > 1                20     20           17451
> >
> > Can some one please help me understand this behaviour of Hadoop in this
> > case. My main purpose of running a MRBench is to calculate the Average
> time
> > for certain amount of Maps, Reduces, InputLines etc. If the number of
> Maps
> > is not what I submitted, then how can I judge my benchmark results?
> >
> >
> >
> > Thanks,
> >
> > Gaurav Dasgupta
>

Re: MRBench Maps strange behaviour

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

The number of maps specified to any map reduce program (including
those part of MRBench) is generally only a hint, and the actual number
of maps will be influenced in typical cases by the amount of data
being processed. You can take a look at this wiki link to understand
more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

In the examples below, since the data you've generated is different,
the number of mappers are different. To be able to judge your
benchmark results, you'd need to benchmark against the same data (or
at least same type of type - i.e. size and type).

The number of maps printed at the end is straight from the input
specified and doesn't reflect what the job actually ran with. The
information from the counters is the right one.

Thanks
Hemanth

On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com> wrote:
> Hi All,
>
> I executed the "MRBench" program from "hadoop-test.jar" in my 12 node CDH3
> cluster. After executing, I had some strange observations regarding the
> number of Maps it ran.
>
> First I ran the command:
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps 200
> -reduces 200 -inputLines 1024 -inputType random
> And I could see that the actual number of Maps it ran was 201 (for all the 3
> runs) instead of 200 (Though the end report displays the launched to be
> 200). Here is the console report:
>
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>
> 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
>
>
>
> Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
>
>
>
> This time the actual number of Maps were only 2 and again the end report
> displays Maps Lauched to be 10. The console output:
>
>
>
> 12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
> 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=6218842112
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=3348828160
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=22955810816
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> DataLines Maps Reduces AvgTime (milliseconds)
> 1                20     20           17451
>
> Can some one please help me understand this behaviour of Hadoop in this
> case. My main purpose of running a MRBench is to calculate the Average time
> for certain amount of Maps, Reduces, InputLines etc. If the number of Maps
> is not what I submitted, then how can I judge my benchmark results?
>
>
>
> Thanks,
>
> Gaurav Dasgupta

Re: MRBench Maps strange behaviour

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

The number of maps specified to any map reduce program (including
those part of MRBench) is generally only a hint, and the actual number
of maps will be influenced in typical cases by the amount of data
being processed. You can take a look at this wiki link to understand
more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

In the examples below, since the data you've generated is different,
the number of mappers are different. To be able to judge your
benchmark results, you'd need to benchmark against the same data (or
at least same type of type - i.e. size and type).

The number of maps printed at the end is straight from the input
specified and doesn't reflect what the job actually ran with. The
information from the counters is the right one.

Thanks
Hemanth

On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com> wrote:
> Hi All,
>
> I executed the "MRBench" program from "hadoop-test.jar" in my 12 node CDH3
> cluster. After executing, I had some strange observations regarding the
> number of Maps it ran.
>
> First I ran the command:
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps 200
> -reduces 200 -inputLines 1024 -inputType random
> And I could see that the actual number of Maps it ran was 201 (for all the 3
> runs) instead of 200 (Though the end report displays the launched to be
> 200). Here is the console report:
>
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>
> 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
>
>
>
> Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
>
>
>
> This time the actual number of Maps were only 2 and again the end report
> displays Maps Lauched to be 10. The console output:
>
>
>
> 12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
> 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=6218842112
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=3348828160
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=22955810816
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> DataLines Maps Reduces AvgTime (milliseconds)
> 1                20     20           17451
>
> Can some one please help me understand this behaviour of Hadoop in this
> case. My main purpose of running a MRBench is to calculate the Average time
> for certain amount of Maps, Reduces, InputLines etc. If the number of Maps
> is not what I submitted, then how can I judge my benchmark results?
>
>
>
> Thanks,
>
> Gaurav Dasgupta

Re: MRBench Maps strange behaviour

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

The number of maps specified to any map reduce program (including
those part of MRBench) is generally only a hint, and the actual number
of maps will be influenced in typical cases by the amount of data
being processed. You can take a look at this wiki link to understand
more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

In the examples below, since the data you've generated is different,
the number of mappers are different. To be able to judge your
benchmark results, you'd need to benchmark against the same data (or
at least same type of type - i.e. size and type).

The number of maps printed at the end is straight from the input
specified and doesn't reflect what the job actually ran with. The
information from the counters is the right one.

Thanks
Hemanth

On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com> wrote:
> Hi All,
>
> I executed the "MRBench" program from "hadoop-test.jar" in my 12 node CDH3
> cluster. After executing, I had some strange observations regarding the
> number of Maps it ran.
>
> First I ran the command:
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps 200
> -reduces 200 -inputLines 1024 -inputType random
> And I could see that the actual number of Maps it ran was 201 (for all the 3
> runs) instead of 200 (Though the end report displays the launched to be
> 200). Here is the console report:
>
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>
> 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
>
>
>
> Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
>
>
>
> This time the actual number of Maps were only 2 and again the end report
> displays Maps Lauched to be 10. The console output:
>
>
>
> 12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
> 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=6218842112
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=3348828160
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=22955810816
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> DataLines Maps Reduces AvgTime (milliseconds)
> 1                20     20           17451
>
> Can some one please help me understand this behaviour of Hadoop in this
> case. My main purpose of running a MRBench is to calculate the Average time
> for certain amount of Maps, Reduces, InputLines etc. If the number of Maps
> is not what I submitted, then how can I judge my benchmark results?
>
>
>
> Thanks,
>
> Gaurav Dasgupta

Re: MRBench Maps strange behaviour

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

The number of maps specified to any map reduce program (including
those part of MRBench) is generally only a hint, and the actual number
of maps will be influenced in typical cases by the amount of data
being processed. You can take a look at this wiki link to understand
more: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

In the examples below, since the data you've generated is different,
the number of mappers are different. To be able to judge your
benchmark results, you'd need to benchmark against the same data (or
at least same type of type - i.e. size and type).

The number of maps printed at the end is straight from the input
specified and doesn't reflect what the job actually ran with. The
information from the counters is the right one.

Thanks
Hemanth

On Tue, Aug 28, 2012 at 4:02 PM, Gaurav Dasgupta <gd...@gmail.com> wrote:
> Hi All,
>
> I executed the "MRBench" program from "hadoop-test.jar" in my 12 node CDH3
> cluster. After executing, I had some strange observations regarding the
> number of Maps it ran.
>
> First I ran the command:
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -numRuns 3 -maps 200
> -reduces 200 -inputLines 1024 -inputType random
> And I could see that the actual number of Maps it ran was 201 (for all the 3
> runs) instead of 200 (Though the end report displays the launched to be
> 200). Here is the console report:
>
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Job complete: job_201208230144_0035
>
> 12/08/28 04:34:35 INFO mapred.JobClient: Counters: 28
>
> 12/08/28 04:34:35 INFO mapred.JobClient:   Job Counters
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched reduce tasks=200
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=617209
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Rack-local map tasks=137
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Launched map tasks=201
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     Data-local map tasks=64
>
> 12/08/28 04:34:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1756882
>
>
>
> Again, I ran the MRBench for just 10 Maps and 10 Reduces:
>
> hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar mrbench -maps 10 -reduces 10
>
>
>
> This time the actual number of Maps were only 2 and again the end report
> displays Maps Lauched to be 10. The console output:
>
>
>
> 12/08/28 05:05:35 INFO mapred.JobClient: Job complete: job_201208230144_0040
> 12/08/28 05:05:35 INFO mapred.JobClient: Counters: 27
> 12/08/28 05:05:35 INFO mapred.JobClient:   Job Counters
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched reduce tasks=20
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6648
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all reduces
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Launched map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Data-local map tasks=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=163257
> 12/08/28 05:05:35 INFO mapred.JobClient:   FileSystemCounters
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_READ=407
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_READ=258
> 12/08/28 05:05:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1072596
> 12/08/28 05:05:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3
> 12/08/28 05:05:35 INFO mapred.JobClient:   Map-Reduce Framework
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce shuffle bytes=647
> 12/08/28 05:05:35 INFO mapred.JobClient:     Spilled Records=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output bytes=5
> 12/08/28 05:05:35 INFO mapred.JobClient:     CPU time spent (ms)=17070
> 12/08/28 05:05:35 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=6218842112
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map input bytes=2
> 12/08/28 05:05:35 INFO mapred.JobClient:     Combine input records=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=254
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce input groups=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Combine output records=0
> 12/08/28 05:05:35 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=3348828160
> 12/08/28 05:05:35 INFO mapred.JobClient:     Reduce output records=1
> 12/08/28 05:05:35 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=22955810816
> 12/08/28 05:05:35 INFO mapred.JobClient:     Map output records=1
> DataLines Maps Reduces AvgTime (milliseconds)
> 1                20     20           17451
>
> Can some one please help me understand this behaviour of Hadoop in this
> case. My main purpose of running a MRBench is to calculate the Average time
> for certain amount of Maps, Reduces, InputLines etc. If the number of Maps
> is not what I submitted, then how can I judge my benchmark results?
>
>
>
> Thanks,
>
> Gaurav Dasgupta