You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Adam Phelps <am...@opendns.com> on 2011/01/07 02:51:12 UTC

Choosing number of map/reduce slots (with hyperthreading)

By scouring various web pages and lists via google I've found some 
general recommendations when it comes to setting the number of map and 
reduce slots for a cluster.  It seems to come down to setting them to 
roughly the number of cores on the machine, minus some if there will be 
other processes active (such as HBase region servers), and to set the 
per-task memory usage so that the total will stay below that of the 
system.  Is this a reasonably general heuristic?

One thing I haven't been able to find advice on is whether this 
heuristic should be adjusted for machines that have hyperthreading 
enabled.  My thought is that it wouldn't be beneficial to increase the 
number of slots (especially in a CPU-bound application) as slots equal 
to the # of cores would already be fully utilizing the CPU.  Are there 
alternative thoughts regarding that?

- Adam

Re: Choosing number of map/reduce slots (with hyperthreading)

Posted by Adam Phelps <am...@opendns.com>.
I'll attempt some tests on this later this week and report back once 
I've done so.

- Adam

On 1/10/11 12:06 AM, Eric wrote:
> With hyperthreading, the cpu tries to prevent being idle by running that
> extra thread when it has some cycles left. It can do so cheaply, since
> hyperthreading is much faster than context switching. So as Arun
> suggests, it probably won't hurt as long as you have enough memory in
> your nodes. Your cpu will be able to use all it's power and jobs might
> take a bit longer to finish, but more jobs will be running at the same
> time. It will probably be faster than waisting cpu cycles when processes
> are waiting for io while your cpu could be running the other thread. The
> best way would be to test this. If you do, please report back to us. I'm
> very curious about the results!

Re: Choosing number of map/reduce slots (with hyperthreading)

Posted by Eric <er...@gmail.com>.
With hyperthreading, the cpu tries to prevent being idle by running that
extra thread when it has some cycles left. It can do so cheaply, since
hyperthreading is much faster than context switching. So as Arun suggests,
it probably won't hurt as long as you have enough memory in your nodes. Your
cpu will be able to use all it's power and jobs might take a bit longer to
finish, but more jobs will be running at the same time. It will probably be
faster than waisting cpu cycles when processes are waiting for io while your
cpu could be running the other thread. The best way would be to test this.
If you do, please report back to us. I'm very curious about the results!

2011/1/9 Arun C Murthy <ac...@yahoo-inc.com>

> Hyperthreading is interesting, but I'd put more emphasis on the amount of
> RAM you have on your boxes.
>
> The JavaVM allocates all it's heap-size upfront, which means your node will
> starting thrashing on RAM if you put too many tasks per node.
>
> Arun
>
>
> On Jan 6, 2011, at 5:51 PM, Adam Phelps wrote:
>
>  By scouring various web pages and lists via google I've found some
>> general recommendations when it comes to setting the number of map and
>> reduce slots for a cluster.  It seems to come down to setting them to
>> roughly the number of cores on the machine, minus some if there will be
>> other processes active (such as HBase region servers), and to set the
>> per-task memory usage so that the total will stay below that of the
>> system.  Is this a reasonably general heuristic?
>>
>> One thing I haven't been able to find advice on is whether this
>> heuristic should be adjusted for machines that have hyperthreading
>> enabled.  My thought is that it wouldn't be beneficial to increase the
>> number of slots (especially in a CPU-bound application) as slots equal
>> to the # of cores would already be fully utilizing the CPU.  Are there
>> alternative thoughts regarding that?
>>
>> - Adam
>>
>
>

Re: Choosing number of map/reduce slots (with hyperthreading)

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Hyperthreading is interesting, but I'd put more emphasis on the amount  
of RAM you have on your boxes.

The JavaVM allocates all it's heap-size upfront, which means your node  
will starting thrashing on RAM if you put too many tasks per node.

Arun

On Jan 6, 2011, at 5:51 PM, Adam Phelps wrote:

> By scouring various web pages and lists via google I've found some
> general recommendations when it comes to setting the number of map and
> reduce slots for a cluster.  It seems to come down to setting them to
> roughly the number of cores on the machine, minus some if there will  
> be
> other processes active (such as HBase region servers), and to set the
> per-task memory usage so that the total will stay below that of the
> system.  Is this a reasonably general heuristic?
>
> One thing I haven't been able to find advice on is whether this
> heuristic should be adjusted for machines that have hyperthreading
> enabled.  My thought is that it wouldn't be beneficial to increase the
> number of slots (especially in a CPU-bound application) as slots equal
> to the # of cores would already be fully utilizing the CPU.  Are there
> alternative thoughts regarding that?
>
> - Adam