You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Roy Smith <ro...@panix.com> on 2013/01/11 23:59:29 UTC

How does hadoop decide how many reducers to run?

I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances.  Each instance is 8 cores, so 32 cores total.  Hadoop ran 16 reducers, followed by a second wave of 12.  It seems to me it was only using half the available cores.  Is this normal?  Is there some way to force it to use all the cores?

---
Roy Smith
roy@panix.com




Re: How does hadoop decide how many reducers to run?

Posted by Michael Segel <mi...@hotmail.com>.
Since you are using EMR,  AWS pre configures the number of slots per node. 
So you are already getting the optimum number of slots that their 'machines' can handle. 

So when you run your job, you said that you saw 16 reducers and then 12 reducers running. 

This could imply that your job required 28 reducers  and it was using the full resources of the machines. 

On Jan 11, 2013, at 5:53 PM, Roy Smith <ro...@panix.com> wrote:

> On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:
> 
>> Hi, 
>> 
>> First, not enough information. 
>> 
>> 1) EC2 got it. 
>> 2) Which flavor of Hadoop? Is this EMR as well? 
> 
> Yes, EMR.  We're running AMI version 2.3.1, which includes hadoop 1.0.3.
> 
> 
>> 3) How many slots did you configure in your mapred-site.xml?
> 
> Hmmm, no clue.  I've never even heard of that file.  We're using mrjob.  It may be that mrjob is building a mapred-site.xml file for me and I never even see it?
> 
> ---
> Roy Smith
> roy@panix.com
> 
> 
> 
> 


Re: How does hadoop decide how many reducers to run?

Posted by Michael Segel <mi...@hotmail.com>.
Since you are using EMR,  AWS pre configures the number of slots per node. 
So you are already getting the optimum number of slots that their 'machines' can handle. 

So when you run your job, you said that you saw 16 reducers and then 12 reducers running. 

This could imply that your job required 28 reducers  and it was using the full resources of the machines. 

On Jan 11, 2013, at 5:53 PM, Roy Smith <ro...@panix.com> wrote:

> On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:
> 
>> Hi, 
>> 
>> First, not enough information. 
>> 
>> 1) EC2 got it. 
>> 2) Which flavor of Hadoop? Is this EMR as well? 
> 
> Yes, EMR.  We're running AMI version 2.3.1, which includes hadoop 1.0.3.
> 
> 
>> 3) How many slots did you configure in your mapred-site.xml?
> 
> Hmmm, no clue.  I've never even heard of that file.  We're using mrjob.  It may be that mrjob is building a mapred-site.xml file for me and I never even see it?
> 
> ---
> Roy Smith
> roy@panix.com
> 
> 
> 
> 


Re: How does hadoop decide how many reducers to run?

Posted by Michael Segel <mi...@hotmail.com>.
Since you are using EMR,  AWS pre configures the number of slots per node. 
So you are already getting the optimum number of slots that their 'machines' can handle. 

So when you run your job, you said that you saw 16 reducers and then 12 reducers running. 

This could imply that your job required 28 reducers  and it was using the full resources of the machines. 

On Jan 11, 2013, at 5:53 PM, Roy Smith <ro...@panix.com> wrote:

> On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:
> 
>> Hi, 
>> 
>> First, not enough information. 
>> 
>> 1) EC2 got it. 
>> 2) Which flavor of Hadoop? Is this EMR as well? 
> 
> Yes, EMR.  We're running AMI version 2.3.1, which includes hadoop 1.0.3.
> 
> 
>> 3) How many slots did you configure in your mapred-site.xml?
> 
> Hmmm, no clue.  I've never even heard of that file.  We're using mrjob.  It may be that mrjob is building a mapred-site.xml file for me and I never even see it?
> 
> ---
> Roy Smith
> roy@panix.com
> 
> 
> 
> 


Re: How does hadoop decide how many reducers to run?

Posted by Michael Segel <mi...@hotmail.com>.
Since you are using EMR,  AWS pre configures the number of slots per node. 
So you are already getting the optimum number of slots that their 'machines' can handle. 

So when you run your job, you said that you saw 16 reducers and then 12 reducers running. 

This could imply that your job required 28 reducers  and it was using the full resources of the machines. 

On Jan 11, 2013, at 5:53 PM, Roy Smith <ro...@panix.com> wrote:

> On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:
> 
>> Hi, 
>> 
>> First, not enough information. 
>> 
>> 1) EC2 got it. 
>> 2) Which flavor of Hadoop? Is this EMR as well? 
> 
> Yes, EMR.  We're running AMI version 2.3.1, which includes hadoop 1.0.3.
> 
> 
>> 3) How many slots did you configure in your mapred-site.xml?
> 
> Hmmm, no clue.  I've never even heard of that file.  We're using mrjob.  It may be that mrjob is building a mapred-site.xml file for me and I never even see it?
> 
> ---
> Roy Smith
> roy@panix.com
> 
> 
> 
> 


Re: How does hadoop decide how many reducers to run?

Posted by Roy Smith <ro...@panix.com>.
On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:

> Hi, 
> 
> First, not enough information. 
> 
> 1) EC2 got it. 
> 2) Which flavor of Hadoop? Is this EMR as well? 

Yes, EMR.  We're running AMI version 2.3.1, which includes hadoop 1.0.3.


> 3) How many slots did you configure in your mapred-site.xml?

Hmmm, no clue.  I've never even heard of that file.  We're using mrjob.  It may be that mrjob is building a mapred-site.xml file for me and I never even see it?

---
Roy Smith
roy@panix.com




Re: How does hadoop decide how many reducers to run?

Posted by Roy Smith <ro...@panix.com>.
On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:

> Hi, 
> 
> First, not enough information. 
> 
> 1) EC2 got it. 
> 2) Which flavor of Hadoop? Is this EMR as well? 

Yes, EMR.  We're running AMI version 2.3.1, which includes hadoop 1.0.3.


> 3) How many slots did you configure in your mapred-site.xml?

Hmmm, no clue.  I've never even heard of that file.  We're using mrjob.  It may be that mrjob is building a mapred-site.xml file for me and I never even see it?

---
Roy Smith
roy@panix.com




Re: How does hadoop decide how many reducers to run?

Posted by Roy Smith <ro...@panix.com>.
On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:

> Hi, 
> 
> First, not enough information. 
> 
> 1) EC2 got it. 
> 2) Which flavor of Hadoop? Is this EMR as well? 

Yes, EMR.  We're running AMI version 2.3.1, which includes hadoop 1.0.3.


> 3) How many slots did you configure in your mapred-site.xml?

Hmmm, no clue.  I've never even heard of that file.  We're using mrjob.  It may be that mrjob is building a mapred-site.xml file for me and I never even see it?

---
Roy Smith
roy@panix.com




Re: How does hadoop decide how many reducers to run?

Posted by Roy Smith <ro...@panix.com>.
On Jan 11, 2013, at 6:20 PM, Michael Segel wrote:

> Hi, 
> 
> First, not enough information. 
> 
> 1) EC2 got it. 
> 2) Which flavor of Hadoop? Is this EMR as well? 

Yes, EMR.  We're running AMI version 2.3.1, which includes hadoop 1.0.3.


> 3) How many slots did you configure in your mapred-site.xml?

Hmmm, no clue.  I've never even heard of that file.  We're using mrjob.  It may be that mrjob is building a mapred-site.xml file for me and I never even see it?

---
Roy Smith
roy@panix.com




Re: How does hadoop decide how many reducers to run?

Posted by Michael Segel <mi...@hotmail.com>.
Hi, 

First, not enough information. 

1) EC2 got it. 
2) Which flavor of Hadoop? Is this EMR as well? 
3) How many slots did you configure in your mapred-site.xml? 

AWS EC2 cores aren't going to be hyperthreaded cores so 8 cores means that you will probably have 6 cores for slots. 
With 16 reducers it sounds like you have 4 mappers and 4 reducers or 8 slots set up. (Over subscription is ok if you're not running HBase) 

So what are you missing? 


On Jan 11, 2013, at 4:59 PM, Roy Smith <ro...@panix.com> wrote:

> I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances.  Each instance is 8 cores, so 32 cores total.  Hadoop ran 16 reducers, followed by a second wave of 12.  It seems to me it was only using half the available cores.  Is this normal?  Is there some way to force it to use all the cores?
> 
> ---
> Roy Smith
> roy@panix.com
> 
> 
> 
> 


Re: How does hadoop decide how many reducers to run?

Posted by Michael Segel <mi...@hotmail.com>.
Hi, 

First, not enough information. 

1) EC2 got it. 
2) Which flavor of Hadoop? Is this EMR as well? 
3) How many slots did you configure in your mapred-site.xml? 

AWS EC2 cores aren't going to be hyperthreaded cores so 8 cores means that you will probably have 6 cores for slots. 
With 16 reducers it sounds like you have 4 mappers and 4 reducers or 8 slots set up. (Over subscription is ok if you're not running HBase) 

So what are you missing? 


On Jan 11, 2013, at 4:59 PM, Roy Smith <ro...@panix.com> wrote:

> I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances.  Each instance is 8 cores, so 32 cores total.  Hadoop ran 16 reducers, followed by a second wave of 12.  It seems to me it was only using half the available cores.  Is this normal?  Is there some way to force it to use all the cores?
> 
> ---
> Roy Smith
> roy@panix.com
> 
> 
> 
> 


Re: How does hadoop decide how many reducers to run?

Posted by Michael Segel <mi...@hotmail.com>.
Hi, 

First, not enough information. 

1) EC2 got it. 
2) Which flavor of Hadoop? Is this EMR as well? 
3) How many slots did you configure in your mapred-site.xml? 

AWS EC2 cores aren't going to be hyperthreaded cores so 8 cores means that you will probably have 6 cores for slots. 
With 16 reducers it sounds like you have 4 mappers and 4 reducers or 8 slots set up. (Over subscription is ok if you're not running HBase) 

So what are you missing? 


On Jan 11, 2013, at 4:59 PM, Roy Smith <ro...@panix.com> wrote:

> I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances.  Each instance is 8 cores, so 32 cores total.  Hadoop ran 16 reducers, followed by a second wave of 12.  It seems to me it was only using half the available cores.  Is this normal?  Is there some way to force it to use all the cores?
> 
> ---
> Roy Smith
> roy@panix.com
> 
> 
> 
> 


Re: How does hadoop decide how many reducers to run?

Posted by Michael Segel <mi...@hotmail.com>.
Hi, 

First, not enough information. 

1) EC2 got it. 
2) Which flavor of Hadoop? Is this EMR as well? 
3) How many slots did you configure in your mapred-site.xml? 

AWS EC2 cores aren't going to be hyperthreaded cores so 8 cores means that you will probably have 6 cores for slots. 
With 16 reducers it sounds like you have 4 mappers and 4 reducers or 8 slots set up. (Over subscription is ok if you're not running HBase) 

So what are you missing? 


On Jan 11, 2013, at 4:59 PM, Roy Smith <ro...@panix.com> wrote:

> I ran a big job the other day on a cluster of 4 m2.4xlarge EC2 instances.  Each instance is 8 cores, so 32 cores total.  Hadoop ran 16 reducers, followed by a second wave of 12.  It seems to me it was only using half the available cores.  Is this normal?  Is there some way to force it to use all the cores?
> 
> ---
> Roy Smith
> roy@panix.com
> 
> 
> 
>