You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by jeremy p <at...@gmail.com> on 2014/06/11 23:37:35 UTC

Re: How to set the max mappers per node on a per-job basis?

Okay, that might be what I need.  Let's say I have 10 nodes in my cluster,
and they all have the same specs.  For Job A (the one that isn't CPU
intensive) I want it to run with 50 mappers per node.  For Job B (the one
that is CPU intensive) I want it to run with 25 mappers per node.  Let's
assume that when each job runs, there are no other jobs running on the
cluster.  Can I just tell Hadoop to run 500 simultaneous mappers for Job A,
and when Job A is done, can I tell Hadoop to run 250 simultaneous mappers
for Job B?  How do I go about doing this?

I've read that mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the
client.  Will I run into problems because of that?

Thanks for the help.

--Jeremy

On Fri, May 30, 2014 at 8:49 PM, Harsh J <ha...@cloudera.com> wrote:

> This has been discussed in past. There is no current dynamic way to
> control the parallel execution on a per-node basis.
>
> Scheduler configurations will let you control overall parallelism (#
> of simultaneous tasks) of specific jobs on a cluster-level basis, but
> not on a per-node level.
>
> On Sat, May 31, 2014 at 4:08 AM, jeremy p
> <at...@gmail.com> wrote:
> > Hello all,
> >
> > I have two jobs, Job A and Job B.  Job A is not very CPU-intensive, and
> so
> > we would like to run it with 50 mappers per node.  Job B is very
> > CPU-intensive, and so we would like to run it with 25 mappers per node.
>  How
> > can we request a different number of mappers per node for each job?  From
> > what I've read, mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the
> > client.
> >
> > --Jeremy
>
>
>
> --
> Harsh J
>

Re:Re: How to set the max mappers per node on a per-job basis?

Posted by pengwenwu <pe...@163.com>.

There is a JIRA dynamic resource configuration(YARN-291) can help this case to dynamic adjust the slot(MRV1)/Resource(include CPU&Memory MRv2) .

Regards,
Wenwu,Peng
VMware

At 2014-06-12 05:37:35, "jeremy p" <at...@gmail.com> wrote:

Okay, that might be what I need.  Let's say I have 10 nodes in my cluster, and they all have the same specs.  For Job A (the one that isn't CPU intensive) I want it to run with 50 mappers per node.  For Job B (the one that is CPU intensive) I want it to run with 25 mappers per node.  Let's assume that when each job runs, there are no other jobs running on the cluster.  Can I just tell Hadoop to run 500 simultaneous mappers for Job A, and when Job A is done, can I tell Hadoop to run 250 simultaneous mappers for Job B?  How do I go about doing this?

I've read that mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the client.  Will I run into problems because of that?

Thanks for the help.

--Jeremy

On Fri, May 30, 2014 at 8:49 PM, Harsh J <ha...@cloudera.com> wrote:
This has been discussed in past. There is no current dynamic way to
control the parallel execution on a per-node basis.

Scheduler configurations will let you control overall parallelism (#
of simultaneous tasks) of specific jobs on a cluster-level basis, but
not on a per-node level.

On Sat, May 31, 2014 at 4:08 AM, jeremy p
<at...@gmail.com> wrote:
> Hello all,
>
> I have two jobs, Job A and Job B.  Job A is not very CPU-intensive, and so
> we would like to run it with 50 mappers per node.  Job B is very
> CPU-intensive, and so we would like to run it with 25 mappers per node.  How
> can we request a different number of mappers per node for each job?  From
> what I've read, mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the
> client.
>
> --Jeremy

--
Harsh J

Re:Re: How to set the max mappers per node on a per-job basis?

Posted by pengwenwu <pe...@163.com>.

There is a JIRA dynamic resource configuration(YARN-291) can help this case to dynamic adjust the slot(MRV1)/Resource(include CPU&Memory MRv2) .

Regards,
Wenwu,Peng
VMware

At 2014-06-12 05:37:35, "jeremy p" <at...@gmail.com> wrote:

Okay, that might be what I need.  Let's say I have 10 nodes in my cluster, and they all have the same specs.  For Job A (the one that isn't CPU intensive) I want it to run with 50 mappers per node.  For Job B (the one that is CPU intensive) I want it to run with 25 mappers per node.  Let's assume that when each job runs, there are no other jobs running on the cluster.  Can I just tell Hadoop to run 500 simultaneous mappers for Job A, and when Job A is done, can I tell Hadoop to run 250 simultaneous mappers for Job B?  How do I go about doing this?

I've read that mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the client.  Will I run into problems because of that?

Thanks for the help.

--Jeremy

On Fri, May 30, 2014 at 8:49 PM, Harsh J <ha...@cloudera.com> wrote:
This has been discussed in past. There is no current dynamic way to
control the parallel execution on a per-node basis.

Scheduler configurations will let you control overall parallelism (#
of simultaneous tasks) of specific jobs on a cluster-level basis, but
not on a per-node level.

On Sat, May 31, 2014 at 4:08 AM, jeremy p
<at...@gmail.com> wrote:
> Hello all,
>
> I have two jobs, Job A and Job B.  Job A is not very CPU-intensive, and so
> we would like to run it with 50 mappers per node.  Job B is very
> CPU-intensive, and so we would like to run it with 25 mappers per node.  How
> can we request a different number of mappers per node for each job?  From
> what I've read, mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the
> client.
>
> --Jeremy

--
Harsh J

Re:Re: How to set the max mappers per node on a per-job basis?

Posted by pengwenwu <pe...@163.com>.

There is a JIRA dynamic resource configuration(YARN-291) can help this case to dynamic adjust the slot(MRV1)/Resource(include CPU&Memory MRv2) .

Regards,
Wenwu,Peng
VMware

At 2014-06-12 05:37:35, "jeremy p" <at...@gmail.com> wrote:

Okay, that might be what I need.  Let's say I have 10 nodes in my cluster, and they all have the same specs.  For Job A (the one that isn't CPU intensive) I want it to run with 50 mappers per node.  For Job B (the one that is CPU intensive) I want it to run with 25 mappers per node.  Let's assume that when each job runs, there are no other jobs running on the cluster.  Can I just tell Hadoop to run 500 simultaneous mappers for Job A, and when Job A is done, can I tell Hadoop to run 250 simultaneous mappers for Job B?  How do I go about doing this?

I've read that mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the client.  Will I run into problems because of that?

Thanks for the help.

--Jeremy

On Fri, May 30, 2014 at 8:49 PM, Harsh J <ha...@cloudera.com> wrote:
This has been discussed in past. There is no current dynamic way to
control the parallel execution on a per-node basis.

Scheduler configurations will let you control overall parallelism (#
of simultaneous tasks) of specific jobs on a cluster-level basis, but
not on a per-node level.

On Sat, May 31, 2014 at 4:08 AM, jeremy p
<at...@gmail.com> wrote:
> Hello all,
>
> I have two jobs, Job A and Job B.  Job A is not very CPU-intensive, and so
> we would like to run it with 50 mappers per node.  Job B is very
> CPU-intensive, and so we would like to run it with 25 mappers per node.  How
> can we request a different number of mappers per node for each job?  From
> what I've read, mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the
> client.
>
> --Jeremy

--
Harsh J

Re:Re: How to set the max mappers per node on a per-job basis?

Posted by pengwenwu <pe...@163.com>.

There is a JIRA dynamic resource configuration(YARN-291) can help this case to dynamic adjust the slot(MRV1)/Resource(include CPU&Memory MRv2) .

Regards,
Wenwu,Peng
VMware

At 2014-06-12 05:37:35, "jeremy p" <at...@gmail.com> wrote:

Okay, that might be what I need.  Let's say I have 10 nodes in my cluster, and they all have the same specs.  For Job A (the one that isn't CPU intensive) I want it to run with 50 mappers per node.  For Job B (the one that is CPU intensive) I want it to run with 25 mappers per node.  Let's assume that when each job runs, there are no other jobs running on the cluster.  Can I just tell Hadoop to run 500 simultaneous mappers for Job A, and when Job A is done, can I tell Hadoop to run 250 simultaneous mappers for Job B?  How do I go about doing this?

I've read that mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the client.  Will I run into problems because of that?

Thanks for the help.

--Jeremy

On Fri, May 30, 2014 at 8:49 PM, Harsh J <ha...@cloudera.com> wrote:
This has been discussed in past. There is no current dynamic way to
control the parallel execution on a per-node basis.

Scheduler configurations will let you control overall parallelism (#
of simultaneous tasks) of specific jobs on a cluster-level basis, but
not on a per-node level.

On Sat, May 31, 2014 at 4:08 AM, jeremy p
<at...@gmail.com> wrote:
> Hello all,
>
> I have two jobs, Job A and Job B.  Job A is not very CPU-intensive, and so
> we would like to run it with 50 mappers per node.  Job B is very
> CPU-intensive, and so we would like to run it with 25 mappers per node.  How
> can we request a different number of mappers per node for each job?  From
> what I've read, mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum cannot be overridden from the
> client.
>
> --Jeremy

--
Harsh J