You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Marc Sturlese <ma...@gmail.com> on 2010/11/16 01:29:07 UTC

Dealing with Jobs with different memory and slots requirements

I have a hadoop test cluster (12 nodes) and I am running different MapReduce
jobs. These Jobs are executed sequencially as the input of one needs the
output of the other.
I am wandering if there is a way to manage the memory of the nodes per Job.
I mean, there are jobs that use all the reduce slots of my cluster and don't
use much memory, these scale so well. But, there are others that don't use
all the reduce slots (and can't be more parallelized) and would be much
faster if i was able to asign more memory to them. I don't see a way to do
something similar to that if I don't turn off the cluster, change the nodes
conf and turn it on again. Which is pretty dirty...
It would be good if, in the same cluster, I could have some nodes with less
reducers and more memory for them and I could tell a Job to use those
nodes... but I don't think it's possible
Maybe I am not dealing with the problem in the right way... Any suggestion
or advice?
Thanks in advance 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Dealing-with-Jobs-with-different-memory-and-slots-requirements-tp1908293p1908293.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Dealing with Jobs with different memory and slots requirements

Posted by Steve Loughran <st...@apache.org>.

On 16/11/10 00:29, Marc Sturlese wrote:
>
> I have a hadoop test cluster (12 nodes) and I am running different MapReduce
> jobs. These Jobs are executed sequencially as the input of one needs the
> output of the other.
> I am wandering if there is a way to manage the memory of the nodes per Job.
> I mean, there are jobs that use all the reduce slots of my cluster and don't
> use much memory, these scale so well. But, there are others that don't use
> all the reduce slots (and can't be more parallelized) and would be much
> faster if i was able to asign more memory to them. I don't see a way to do
> something similar to that if I don't turn off the cluster, change the nodes
> conf and turn it on again. Which is pretty dirty...

> It would be good if, in the same cluster, I could have some nodes with less
> reducers and more memory for them and I could tell a Job to use those
> nodes... but I don't think it's possible

you can't say where reducers will run, though you can give nodes 
different numbers of map or reduce slots. if your reducers are all 
memory hungry, give the machines less reduce slots than map slots.

There's work underway to be more aware of system load when scheduling 
things, rather than have a fairly simplistic "slot" model, look more at 
system load and memory load as a way of measuring how idle machines are. 
If you were to be really devious, you'd look at io load, network, 
machine temperature, etc. If you find this an interesting problem to get 
involved in, the mapreduce-dev mailing list is the place to get involved.

Be advised, scheduling and placement are CS-hard problems: fun to work 
in if you enjoy the issues, but there is no perfect solution

steve