You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jeyendran Balakrishnan <jb...@docomolabs-usa.com> on 2010/02/01 23:59:16 UTC

RE: do all mappers finish before reducer starts

Correct me if I'm wrong, but this:

>> Yes, any reduce function call should be after all the mappers have done
>> their work.

is strictly true only if speculative execution is explicitly turned off. Otherwise there is a chance that some reduce tasks can actually start before all the maps are complete. In case it turns out that some map output key used by one speculative reduce task is output by some other map after this reduce task has started, I think the JT then kills this speculative task.



-----Original Message-----
From: Gang Luo [mailto:lgpublic@yahoo.com.cn] 
Sent: Friday, January 29, 2010 2:27 PM
To: common-user@hadoop.apache.org
Subject: Re: do all mappers finish before reducer starts

It seems this is a hot issue

When any mapper finishes (the sorted intermediate result is on local disk), the shuffle start to transfer the result to corresponding reducers, even other mappers are still working.  For the shuffle is part of the reduce phase, the map phase and reduce phase could be seen overlap to some extend. That is why you see such a progress report. 

What you actually mentioned is the reduce function. Yes, any reduce function call should be after all the mappers have done their work. 

 -Gang


----- 原始邮件 ----
发件人： adeelmahmood <ad...@gmail.com>
收件人： core-user@hadoop.apache.org
发送日期： 2010/1/29 (周五) 4:10:50 下午
主   题： do all mappers finish before reducer starts


I just have a conceptual question. My understanding is that all the mappers
have to complete their job for the reducers to start working because mappers
dont know about each other so we need values for a given key from all the
different mappers so we have to wait until all mappers have collectively
given the system all possible values for a key .so that then that can be
passed on the reducer .. 
but when I ran these jobs .. almost everytime before the mappers are all
done the reducers start working .. so it would say map 60% reduce 30% .. how
does this works
Does it finds all possibly values for a single key from all mappers .. pass
that on the reducer and then works on other keys
any help is appreciated
-- 
View this message in context: http://old.nabble.com/do-all-mappers-finish-before-reducer-starts-tp27330927p27330927.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


      ___________________________________________________________ 
  好玩贺卡等你发，邮箱贺卡全新上线！ 
http://card.mail.cn.yahoo.com/

Re: do all mappers finish before reducer starts

Posted by Ken Goodhope <ke...@gmail.com>.

The reduce function is always called after all map tasks are complete.  This
is not to be confused with the reduce "task".  The reduce task can be
launched and begin copying data as soon as the first mapper completes.  By
default though, reduce tasks are not launched until 5% of the mappers are
completed.

2010/2/1 Jeyendran Balakrishnan <jb...@docomolabs-usa.com>

> Correct me if I'm wrong, but this:
>
> >> Yes, any reduce function call should be after all the mappers have done
> >> their work.
>
> is strictly true only if speculative execution is explicitly turned off.
> Otherwise there is a chance that some reduce tasks can actually start before
> all the maps are complete. In case it turns out that some map output key
> used by one speculative reduce task is output by some other map after this
> reduce task has started, I think the JT then kills this speculative task.
>
>
>
> -----Original Message-----
> From: Gang Luo [mailto:lgpublic@yahoo.com.cn]
> Sent: Friday, January 29, 2010 2:27 PM
> To: common-user@hadoop.apache.org
> Subject: Re: do all mappers finish before reducer starts
>
> It seems this is a hot issue
>
> When any mapper finishes (the sorted intermediate result is on local disk),
> the shuffle start to transfer the result to corresponding reducers, even
> other mappers are still working.  For the shuffle is part of the reduce
> phase, the map phase and reduce phase could be seen overlap to some extend.
> That is why you see such a progress report.
>
> What you actually mentioned is the reduce function. Yes, any reduce
> function call should be after all the mappers have done their work.
>
>  -Gang
>
>
> ----- 原始邮件 ----
> 发件人： adeelmahmood <ad...@gmail.com>
> 收件人： core-user@hadoop.apache.org
> 发送日期： 2010/1/29 (周五) 4:10:50 下午
> 主   题： do all mappers finish before reducer starts
>
>
> I just have a conceptual question. My understanding is that all the mappers
> have to complete their job for the reducers to start working because
> mappers
> dont know about each other so we need values for a given key from all the
> different mappers so we have to wait until all mappers have collectively
> given the system all possible values for a key .so that then that can be
> passed on the reducer ..
> but when I ran these jobs .. almost everytime before the mappers are all
> done the reducers start working .. so it would say map 60% reduce 30% ..
> how
> does this works
> Does it finds all possibly values for a single key from all mappers .. pass
> that on the reducer and then works on other keys
> any help is appreciated
> --
> View this message in context:
> http://old.nabble.com/do-all-mappers-finish-before-reducer-starts-tp27330927p27330927.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>
>      ___________________________________________________________
>  好玩贺卡等你发，邮箱贺卡全新上线！
> http://card.mail.cn.yahoo.com/
>