You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Harsh J <ha...@cloudera.com> on 2012/09/29 21:23:18 UTC
Re: Pseudo distributed mode : How to increase no of concurrent map task
Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.
On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I submit a job to process a file of size about 1.6 GB, only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>
--
Harsh J
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Harsh J <ha...@cloudera.com>.
Jay,
The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.
With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
<project>.<daemonname>.configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.
On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hmmm... I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?
--
Harsh J
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Harsh J <ha...@cloudera.com>.
Jay,
The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.
With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
<project>.<daemonname>.configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.
On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hmmm... I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?
--
Harsh J
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Harsh J <ha...@cloudera.com>.
Jay,
The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.
With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
<project>.<daemonname>.configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.
On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hmmm... I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?
--
Harsh J
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Harsh J <ha...@cloudera.com>.
Jay,
The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.
With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
<project>.<daemonname>.configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.
On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hmmm... I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?
--
Harsh J
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Jay Vyas <ja...@gmail.com>.
Hmmm... I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.
How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Jay Vyas <ja...@gmail.com>.
Hmmm... I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.
How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Jay Vyas <ja...@gmail.com>.
Hmmm... I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.
How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Jay Vyas <ja...@gmail.com>.
Hmmm... I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.
How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Shing Hing Man <ma...@yahoo.com>.
I did restart TaskTracker after setting
mapred.tasktracker.map.tasks.maximum.
But I have been using Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).
When I set mapred.tasktracker.map.tasks.maximum to 6 in
mapred-site.xml, I see 6 concurrent map tasks running.
That solves my problem !
Thanks!
Shing
----- Original Message -----
From: Harsh J <ha...@cloudera.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Cc:
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task
Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.
On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I submit a job to process a file of size about 1.6 GB, only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>
--
Harsh J
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Shing Hing Man <ma...@yahoo.com>.
I did restart TaskTracker after setting
mapred.tasktracker.map.tasks.maximum.
But I have been using Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).
When I set mapred.tasktracker.map.tasks.maximum to 6 in
mapred-site.xml, I see 6 concurrent map tasks running.
That solves my problem !
Thanks!
Shing
----- Original Message -----
From: Harsh J <ha...@cloudera.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Cc:
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task
Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.
On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I submit a job to process a file of size about 1.6 GB, only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>
--
Harsh J
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Shing Hing Man <ma...@yahoo.com>.
I did restart TaskTracker after setting
mapred.tasktracker.map.tasks.maximum.
But I have been using Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).
When I set mapred.tasktracker.map.tasks.maximum to 6 in
mapred-site.xml, I see 6 concurrent map tasks running.
That solves my problem !
Thanks!
Shing
----- Original Message -----
From: Harsh J <ha...@cloudera.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Cc:
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task
Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.
On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I submit a job to process a file of size about 1.6 GB, only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>
--
Harsh J
Re: Pseudo distributed mode : How to increase no of concurrent map task
Posted by Shing Hing Man <ma...@yahoo.com>.
I did restart TaskTracker after setting
mapred.tasktracker.map.tasks.maximum.
But I have been using Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).
When I set mapred.tasktracker.map.tasks.maximum to 6 in
mapred-site.xml, I see 6 concurrent map tasks running.
That solves my problem !
Thanks!
Shing
----- Original Message -----
From: Harsh J <ha...@cloudera.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Cc:
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task
Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.
On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I submit a job to process a file of size about 1.6 GB, only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>
--
Harsh J