You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Andrés Durán <du...@tadium.es> on 2012/05/22 14:06:33 UTC

Question about reducers

Hello,

        I'm working with a Hadoop, version is 1.0.3 and configured in pseudo-distributed mode.

        I have 128 reducers tasks and it's running in a local machine with 32 cores. The job is working fine and fast it  takes 1 hour and 30 minutes to fininsh. But when the Job starts, the reducers are comming to the running phase from the tasks queue very slow, it takes 7 minutes to allocate 32 tasks in the running phase. Why is too slow to allocate task in running mode? It's possible to adjust any variable in the jobs tracker setup to reduce this allocation time?

 Thanks to all!

 Best regards,
        Andrés Durán

Re: Question about reducers

Posted by Andrés Durán <du...@tadium.es>.
Many thanks Harsh, I will try it.  :D

Best regards,
	Andrés Durán


El 22/05/2012, a las 14:25, Harsh J escribió:

> A minor correction: CapacityScheduler doesn't seem to do multi-reducer
> assignments (or at least not in 1.x), but does do multi-map
> assignments. This is for the same reason as
> http://search-hadoop.com/m/KYv8JhkOHc1. FairScheduler in 1.x supports
> multi-map and multi-reducer assignments over single heartbeats, which
> should do good on your single 32-task machine.
> 
> Do give it a try and let us know!
> 
> On Tue, May 22, 2012 at 5:51 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi,
>> 
>> This may be cause, depending on your scheduler, only one Reducer may
>> be allocated per TT heartbeat. A reasoning of why this is the case is
>> explained here: http://search-hadoop.com/m/KYv8JhkOHc1
>> 
>> You may have better results in 1.0.3 using an alternative scheduler
>> such as FairScheduler with multiple-assignments-per-heartbeat turned
>> on (See http://hadoop.apache.org/common/docs/current/fair_scheduler.html
>> and boolean property "mapred.fairscheduler.assignmultiple" to enable)
>> or via CapacityScheduler (See
>> http://hadoop.apache.org/common/docs/current/capacity_scheduler.html)
>> which does it as well (OOB).
>> 
>> On Tue, May 22, 2012 at 5:36 PM, Andrés Durán <du...@tadium.es> wrote:
>>> Hello,
>>> 
>>>        I'm working with a Hadoop, version is 1.0.3 and configured in pseudo-distributed mode.
>>> 
>>>        I have 128 reducers tasks and it's running in a local machine with 32 cores. The job is working fine and fast it  takes 1 hour and 30 minutes to fininsh. But when the Job starts, the reducers are comming to the running phase from the tasks queue very slow, it takes 7 minutes to allocate 32 tasks in the running phase. Why is too slow to allocate task in running mode? It's possible to adjust any variable in the jobs tracker setup to reduce this allocation time?
>>> 
>>>  Thanks to all!
>>> 
>>>  Best regards,
>>>        Andrés Durán
>> 
>> 
>> 
>> --
>> Harsh J
> 
> 
> 
> -- 
> Harsh J


Re: Question about reducers

Posted by Harsh J <ha...@cloudera.com>.
A minor correction: CapacityScheduler doesn't seem to do multi-reducer
assignments (or at least not in 1.x), but does do multi-map
assignments. This is for the same reason as
http://search-hadoop.com/m/KYv8JhkOHc1. FairScheduler in 1.x supports
multi-map and multi-reducer assignments over single heartbeats, which
should do good on your single 32-task machine.

Do give it a try and let us know!

On Tue, May 22, 2012 at 5:51 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> This may be cause, depending on your scheduler, only one Reducer may
> be allocated per TT heartbeat. A reasoning of why this is the case is
> explained here: http://search-hadoop.com/m/KYv8JhkOHc1
>
> You may have better results in 1.0.3 using an alternative scheduler
> such as FairScheduler with multiple-assignments-per-heartbeat turned
> on (See http://hadoop.apache.org/common/docs/current/fair_scheduler.html
> and boolean property "mapred.fairscheduler.assignmultiple" to enable)
> or via CapacityScheduler (See
> http://hadoop.apache.org/common/docs/current/capacity_scheduler.html)
> which does it as well (OOB).
>
> On Tue, May 22, 2012 at 5:36 PM, Andrés Durán <du...@tadium.es> wrote:
>> Hello,
>>
>>        I'm working with a Hadoop, version is 1.0.3 and configured in pseudo-distributed mode.
>>
>>        I have 128 reducers tasks and it's running in a local machine with 32 cores. The job is working fine and fast it  takes 1 hour and 30 minutes to fininsh. But when the Job starts, the reducers are comming to the running phase from the tasks queue very slow, it takes 7 minutes to allocate 32 tasks in the running phase. Why is too slow to allocate task in running mode? It's possible to adjust any variable in the jobs tracker setup to reduce this allocation time?
>>
>>  Thanks to all!
>>
>>  Best regards,
>>        Andrés Durán
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: Question about reducers

Posted by Harsh J <ha...@cloudera.com>.
Hi,

This may be cause, depending on your scheduler, only one Reducer may
be allocated per TT heartbeat. A reasoning of why this is the case is
explained here: http://search-hadoop.com/m/KYv8JhkOHc1

You may have better results in 1.0.3 using an alternative scheduler
such as FairScheduler with multiple-assignments-per-heartbeat turned
on (See http://hadoop.apache.org/common/docs/current/fair_scheduler.html
and boolean property "mapred.fairscheduler.assignmultiple" to enable)
or via CapacityScheduler (See
http://hadoop.apache.org/common/docs/current/capacity_scheduler.html)
which does it as well (OOB).

On Tue, May 22, 2012 at 5:36 PM, Andrés Durán <du...@tadium.es> wrote:
> Hello,
>
>        I'm working with a Hadoop, version is 1.0.3 and configured in pseudo-distributed mode.
>
>        I have 128 reducers tasks and it's running in a local machine with 32 cores. The job is working fine and fast it  takes 1 hour and 30 minutes to fininsh. But when the Job starts, the reducers are comming to the running phase from the tasks queue very slow, it takes 7 minutes to allocate 32 tasks in the running phase. Why is too slow to allocate task in running mode? It's possible to adjust any variable in the jobs tracker setup to reduce this allocation time?
>
>  Thanks to all!
>
>  Best regards,
>        Andrés Durán



-- 
Harsh J