You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ferdinand Neman <ne...@gmail.com> on 2010/05/19 08:06:41 UTC

Calculating the slot

Hi All,

Im new to hadoop and successfuly runs many times MapRed task on my
small cluster (6 machines).
Now I realizes that by default only 1 reducer assigned to the job. and
with only 1 reducer things going slow.
I've read some documents and about to increase the number of reducer.

Hadoop Definitive Guide book says that "The optimal number of reducers
is related to the total number of available reducer slots
in your cluster. The total number of slots is given by the product of
the number of nodes
in the cluster and the value of the
mapred.tasktracker.reduce.tasks.maximum property"

I dont understand the "product of the number of nodes in cluster"
part. Can some one help me on this ?

Regards,

-- 
Ferdinand Neman
----
Developer Team Lead, System Analyst,
System Designer and Solution Architect

http://www.linkedin.com/in/fneman

Re: Calculating the slot

Posted by Buyung Bahari <bu...@detik.com>.
Ferdinand Neman wrote:
> On Thu, May 20, 2010 at 2:35 AM, Allen Wittenauer
> <aw...@linkedin.com> wrote:
>   
>> On May 18, 2010, at 11:06 PM, Ferdinand Neman wrote:
>>     
>>> I dont understand the "product of the number of nodes in cluster"
>>> part. Can some one help me on this ?
>>>       
>> Aa simple google search (define:product) would have let you to this definition:
>>
>> a quantity obtained by multiplication; "the product of 2 and 3 is 6"
>>     
>
> Pardon my english, Im Indonesian,
>
> However, i've read in Map/Reduce tutorial ;
> "How Many Reduces?
> The right number of reduces seems to be 0.95 or 1.75 multiplied by
> (<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).
> With 0.95 all of the reduces can launch immediately and start
> transfering map outputs as the maps finish. With 1.75 the faster nodes
> will finish their first round of reduces and launch a second wave of
> reduces doing a much better job of load balancing."
>
> I have small cluster, 4 task tracker,
> mapred.tasktracker.reduce.tasks.maximum is 7. So maximum total reducer
> slot would be 28. I try to set number of reducer to 20.
>
> Before setting the number of reducer (where default reducer is 1) a
> job that i run can finish in 40 minutes. Now with 20 reducer it take
> longer to 44 minutes. Is this normal ?
>
>   
According to my experience, more reducer cause more split. And because 
of that 1 reducer have less data and another reducer have more data 
variant. So the reducer with less data should wait the reducer with more 
data. The effect of that, the process takes longer.

So in my configuration, i set reducer number per server same with cpu 
core. If you have 8 core, you can set Map to 4 and reducer to 4. Because 
in reality, the map and reduce can run parallel.

Sory, i'm indonesian to :). So my english not good enough to explain.

Re: Calculating the slot

Posted by Ferdinand Neman <ne...@gmail.com>.
On Thu, May 20, 2010 at 2:35 AM, Allen Wittenauer
<aw...@linkedin.com> wrote:
>
> On May 18, 2010, at 11:06 PM, Ferdinand Neman wrote:
>> I dont understand the "product of the number of nodes in cluster"
>> part. Can some one help me on this ?
>
>
> Aa simple google search (define:product) would have let you to this definition:
>
> a quantity obtained by multiplication; "the product of 2 and 3 is 6"

Pardon my english, Im Indonesian,

However, i've read in Map/Reduce tutorial ;
"How Many Reduces?
The right number of reduces seems to be 0.95 or 1.75 multiplied by
(<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).
With 0.95 all of the reduces can launch immediately and start
transfering map outputs as the maps finish. With 1.75 the faster nodes
will finish their first round of reduces and launch a second wave of
reduces doing a much better job of load balancing."

I have small cluster, 4 task tracker,
mapred.tasktracker.reduce.tasks.maximum is 7. So maximum total reducer
slot would be 28. I try to set number of reducer to 20.

Before setting the number of reducer (where default reducer is 1) a
job that i run can finish in 40 minutes. Now with 20 reducer it take
longer to 44 minutes. Is this normal ?

-- 
Ferdinand Neman
----
Developer Team Lead, System Analyst,
System Designer and Solution Architect

http://www.linkedin.com/in/fneman

Re: Calculating the slot

Posted by Allen Wittenauer <aw...@linkedin.com>.
On May 18, 2010, at 11:06 PM, Ferdinand Neman wrote:
> I dont understand the "product of the number of nodes in cluster"
> part. Can some one help me on this ?


Aa simple google search (define:product) would have let you to this definition:

a quantity obtained by multiplication; "the product of 2 and 3 is 6"