You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Rakhi Khatwani <ra...@gmail.com> on 2009/06/04 08:34:46 UTC

Customizing machines to use for different jobs

Hi,

Can we specify which subset of machines to use for different jobs? E.g. We
set machine A as namenode, and B, C, D as datanodes. Then for job 1, we have
a mapreduce that runs on B & C and for job 2, the map-reduce runs on C & D.

Regards,
Raakhi

Re: Customizing machines to use for different jobs

Posted by Alex Loddengaard <al...@cloudera.com>.

Hi Raakhi,

Unfortunately there is no built-in way of doing this.  You'd have to
instantiate two entirely separate Hadoop clusters to accomplish what you're
trying to do, which isn't an uncommon thing to do.

I'm not sure why you're hoping to have this behavior, but the fair share
scheduler might be helpful to you.  It let's you essentially divvy up your
cluster into queues, where each queue has its own "chunk" of the cluster.
When resources are available outside of the "chunk," then jobs can span into
other queues' space.

Cloudera's Distribution for Hadoop (<http://www.cloudera.com/hadoop>)
includes the fair share scheduler.  I recommend using our distribution,
otherwise here is the fair share JIRA:

<http://issues.apache.org/jira/browse/HADOOP-3746>

Hope this helps,

Alex

On Wed, Jun 3, 2009 at 11:34 PM, Rakhi Khatwani <ra...@gmail.com>wrote:

> Hi,
>
> Can we specify which subset of machines to use for different jobs? E.g. We
> set machine A as namenode, and B, C, D as datanodes. Then for job 1, we
> have
> a mapreduce that runs on B & C and for job 2, the map-reduce runs on C & D.
>
> Regards,
> Raakhi
>