You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Raymond Jennings III <ra...@yahoo.com> on 2009/11/13 18:05:00 UTC

Is the job tracker a master node?

I am running with the NameNode and JobTracker on separate machines.  Does the JobTracker node need to be specified in the conf/master file?  I am not running it as a slave node so I do not have it in the cond/slaves file.  Thanks!

Re: I thought map and reduce could not overlap?

Posted by David Howell <de...@gmail.com>.

The first 2/3 of the reduce phase (as reported by the progress meters)
are all about getting the map results from the map tasktracker to the
reduce tasktracker and sorting them. The real reduce happens in the
last third, and that part won't start until all of the maps are done.

On Sat, Nov 14, 2009 at 10:05 AM, Raymond Jennings III
<ra...@yahoo.com> wrote:
> I thought there was a barrier that ensured the map phase would finish before the reduce phase started but I see on the sample hadoop word count app:
>
> 09/11/14 10:58:50 INFO mapred.JobClient:  map 79% reduce 18%
> 09/11/14 10:58:54 INFO mapred.JobClient:  map 79% reduce 19%
> 09/11/14 10:58:55 INFO mapred.JobClient:  map 80% reduce 19%
> 09/11/14 10:58:58 INFO mapred.JobClient:  map 80% reduce 20%
> 09/11/14 10:59:00 INFO mapred.JobClient:  map 81% reduce 20%
> 09/11/14 10:59:04 INFO mapred.JobClient:  map 82% reduce 20%
> 09/11/14 10:59:05 INFO mapred.JobClient:  map 82% reduce 21%
> 09/11/14 10:59:08 INFO mapred.JobClient:  map 82% reduce 22%
>
> That looks loke they are overlapping?
>
>
>
>
>

Re: I thought map and reduce could not overlap?

Posted by Tim Robertson <ti...@gmail.com>.

My understanding is the following:
As map tasks finish, it starts to pipe the output of the map to the
reducer machines, but it does not do the reduce yet.  During this
stage if you look at the running reducers, you will see it say
something like "copying 4 of 45".  Once all the maps have finished and
copied, you will see Reduce at 33%.  Once all the maps have finished,
the copying will finish afterwards, then the sorting, and then the
reduce starts.

Basically this overlap is just it beginning to copy the data that is
ready onto the reducer machines.

Cheers

Tim

On Sat, Nov 14, 2009 at 5:05 PM, Raymond Jennings III
<ra...@yahoo.com> wrote:
> I thought there was a barrier that ensured the map phase would finish before the reduce phase started but I see on the sample hadoop word count app:
>
> 09/11/14 10:58:50 INFO mapred.JobClient:  map 79% reduce 18%
> 09/11/14 10:58:54 INFO mapred.JobClient:  map 79% reduce 19%
> 09/11/14 10:58:55 INFO mapred.JobClient:  map 80% reduce 19%
> 09/11/14 10:58:58 INFO mapred.JobClient:  map 80% reduce 20%
> 09/11/14 10:59:00 INFO mapred.JobClient:  map 81% reduce 20%
> 09/11/14 10:59:04 INFO mapred.JobClient:  map 82% reduce 20%
> 09/11/14 10:59:05 INFO mapred.JobClient:  map 82% reduce 21%
> 09/11/14 10:59:08 INFO mapred.JobClient:  map 82% reduce 22%
>
> That looks loke they are overlapping?
>
>
>
>
>

Re: I thought map and reduce could not overlap?

Posted by Kevin Weil <ke...@gmail.com>.

The first third of the reduce phase is really the shuffle, where map  
outputs get sent to and collected at their respective refucers. You'll  
see this transfer happening, and the "reduce" creeping up towards 33%,  
towards the end of your map phase.  The 33% mark is where the real  
barrier is.

Kevin

On Nov 14, 2009, at 8:05 AM, Raymond Jennings III  
<ra...@yahoo.com> wrote:

> I thought there was a barrier that ensured the map phase would  
> finish before the reduce phase started but I see on the sample  
> hadoop word count app:
>
> 09/11/14 10:58:50 INFO mapred.JobClient:  map 79% reduce 18%
> 09/11/14 10:58:54 INFO mapred.JobClient:  map 79% reduce 19%
> 09/11/14 10:58:55 INFO mapred.JobClient:  map 80% reduce 19%
> 09/11/14 10:58:58 INFO mapred.JobClient:  map 80% reduce 20%
> 09/11/14 10:59:00 INFO mapred.JobClient:  map 81% reduce 20%
> 09/11/14 10:59:04 INFO mapred.JobClient:  map 82% reduce 20%
> 09/11/14 10:59:05 INFO mapred.JobClient:  map 82% reduce 21%
> 09/11/14 10:59:08 INFO mapred.JobClient:  map 82% reduce 22%
>
> That looks loke they are overlapping?
>
>
>
>

I thought map and reduce could not overlap?

Posted by Raymond Jennings III <ra...@yahoo.com>.

I thought there was a barrier that ensured the map phase would finish before the reduce phase started but I see on the sample hadoop word count app:

09/11/14 10:58:50 INFO mapred.JobClient:  map 79% reduce 18%
09/11/14 10:58:54 INFO mapred.JobClient:  map 79% reduce 19%
09/11/14 10:58:55 INFO mapred.JobClient:  map 80% reduce 19%
09/11/14 10:58:58 INFO mapred.JobClient:  map 80% reduce 20%
09/11/14 10:59:00 INFO mapred.JobClient:  map 81% reduce 20%
09/11/14 10:59:04 INFO mapred.JobClient:  map 82% reduce 20%
09/11/14 10:59:05 INFO mapred.JobClient:  map 82% reduce 21%
09/11/14 10:59:08 INFO mapred.JobClient:  map 82% reduce 22%

That looks loke they are overlapping?

RE: Is the job tracker a master node?

Posted by zjffdu <zj...@gmail.com>.

The conf/master contains the second name node not master node (the file name
is a bit confusing)

You can configure your name node in core-site.xml and configure your job
tracker in mapred-site.xml


Jeff Zhang



-----Original Message-----
From: Raymond Jennings III [mailto:raymondjiii@yahoo.com] 
Sent: 2009年11月13日 9:05
To: common-user@hadoop.apache.org
Subject: Is the job tracker a master node?

I am running with the NameNode and JobTracker on separate machines.  Does
the JobTracker node need to be specified in the conf/master file?  I am not
running it as a slave node so I do not have it in the cond/slaves file.
Thanks!