You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Kishore kumar <ki...@techdigita.in> on 2014/04/23 13:18:40 UTC

Reduce Job Start

Hi All,

I heard about the reduce job, it will be started after all map tasks
finished 100%, but in my hive query the reduce job started at below stage,
please explain why is this.(I copied below line when the job is running).

2014-04-22 21:15:12,803 Stage-1 map = 83%, reduce = 1%, Cumulative CPU
4194.4 sec

-- 


*Kishore *

Re: Reduce Job Start

Posted by unmesha sreeveni <un...@gmail.com>.

Reduce starts only after all Map task finishes.Reducers pull data from
mappers ,but processing is done only after all map get finished.
It is better to look into JObtracker UI instead of looking into console.
There you can see only after map 100% Reducer starts

-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: Reduce Job Start

Posted by Rakesh Davanum <ra...@gmail.com>.

You can control when the reduce task even starts in the first place. The
parameter *mapred.reduce.slowstart.completed.maps *
specifies the fraction of the number of maps in the job which should be
complete before reduces are scheduled for the job. So for example if you
set this to .70 then reduce task will start after 70% of mappers are
completed.  Remember that the actual 'reduce' phase of the reducer will not
start until all the mappers are competed. Sometimes when this value is set
too low, reduce tasks starts even when there are a lot of mappers yet to
complete. This results in a lot of killed reduce task attempts as they are
waiting for the outputs from all the mappers to be available.

Thanks & Regards,
Rakesh

On Wed, Apr 23, 2014 at 7:43 AM, Chi Huynh <hu...@initions.com> wrote:

> The MapReduce-Job contains a shuffle phase, where the intermediary map
> outputs are copied to the reducer nodes. This phase of the job is assumed
> to be part of the reduce-phase, therefore. the counter already starts
> before the map-phase has finished. The actual reduce task will be started,
> just as you have heard, when all the map tasks are finished.
>
>
> On Wednesday, April 23, 2014 1:18:40 PM UTC+2, Kishore kumar wrote:
>>
>> Hi All,
>>
>> I heard about the reduce job, it will be started after all map tasks
>> finished 100%, but in my hive query the reduce job started at below stage,
>> please explain why is this.(I copied below line when the job is running).
>>
>> 2014-04-22 21:15:12,803 Stage-1 map = 83%, reduce = 1%, Cumulative CPU
>> 4194.4 sec
>>
>> --
>>
>>
>> *Kishore *
>>
>

Re: Reduce Job Start

Posted by Chi Huynh <hu...@initions.com>.

The MapReduce-Job contains a shuffle phase, where the intermediary map 
outputs are copied to the reducer nodes. This phase of the job is assumed 
to be part of the reduce-phase, therefore. the counter already starts 
before the map-phase has finished. The actual reduce task will be started, 
just as you have heard, when all the map tasks are finished.

On Wednesday, April 23, 2014 1:18:40 PM UTC+2, Kishore kumar wrote:
>
> Hi All,
>
> I heard about the reduce job, it will be started after all map tasks 
> finished 100%, but in my hive query the reduce job started at below stage, 
> please explain why is this.(I copied below line when the job is running).
>  
> 2014-04-22 21:15:12,803 Stage-1 map = 83%, reduce = 1%, Cumulative CPU 
> 4194.4 sec
>
> -- 
>
>
> *Kishore *
>