You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Harsh J <ha...@cloudera.com> on 2012/09/29 21:23:18 UTC

Re: Pseudo distributed mode : How to increase no of concurrent map task

Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.

On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo  distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I  submit a job to process a file of  size about 1.6 GB,  only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>



-- 
Harsh J

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Harsh J <ha...@cloudera.com>.

Jay,

The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.

With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
<project>.<daemonname>.configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.

On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hmmm...  I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?

-- 
Harsh J

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Harsh J <ha...@cloudera.com>.

Jay,

The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.

With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
<project>.<daemonname>.configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.

On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hmmm...  I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?

-- 
Harsh J

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Harsh J <ha...@cloudera.com>.

Jay,

The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.

With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
<project>.<daemonname>.configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.

On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hmmm...  I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?

-- 
Harsh J

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Harsh J <ha...@cloudera.com>.

Jay,

The right answer would be: Do not use config strings, use API points
available for each config tweak allowed at the client/job level.
However, things may not be complete here and there may be advanced
level of tuning that we didn't want to provide/support an API for. So…
yeah.

With 2.x+, which renamed much of config files for consistency,
hopefully, you can rely on names. If a config name goes
<project>.<daemonname>.configname, such as "yarn.nodemanager.foo",
then it is daemon-specific. Otherwise, client overridable in one way
or the other. A certain level of this hint can be applied to
1.x/0.20.x as well, but there are certain property names that flout
that standard format from the past.

On Sun, Sep 30, 2012 at 2:34 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hmmm...  I always make this mistake on my hadoop vm -- trying to set
> parameters which require xml settings in the conf.setInt(...) API at
> runtime, which sometimes has no effect.
>
> How can we know, (without having to individually troubleshoot a parameter)
> which parameters CAN versus CANNOT be set programmatically during a m/r job?

-- 
Harsh J

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Jay Vyas <ja...@gmail.com>.

Hmmm...  I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.

How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Jay Vyas <ja...@gmail.com>.

Hmmm...  I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.

How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Jay Vyas <ja...@gmail.com>.

Hmmm...  I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.

How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Jay Vyas <ja...@gmail.com>.

Hmmm...  I always make this mistake on my hadoop vm -- trying to set
parameters which require xml settings in the conf.setInt(...) API at
runtime, which sometimes has no effect.

How can we know, (without having to individually troubleshoot a parameter)
which parameters CAN versus CANNOT be set programmatically during a m/r job?

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Shing Hing Man <ma...@yahoo.com>.

I did restart TaskTracker after  setting 
mapred.tasktracker.map.tasks.maximum.

But I have been  using Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).

When I set mapred.tasktracker.map.tasks.maximum to 6 in 

mapred-site.xml, I see 6 concurrent map tasks running. 

That  solves my problem !

Thanks! 

Shing 

----- Original Message -----
From: Harsh J <ha...@cloudera.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Cc: 
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task

Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.

On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo  distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I  submit a job to process a file of  size about 1.6 GB,  only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>

-- 
Harsh J

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Shing Hing Man <ma...@yahoo.com>.

I did restart TaskTracker after  setting 
mapred.tasktracker.map.tasks.maximum.

But I have been  using Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).

When I set mapred.tasktracker.map.tasks.maximum to 6 in 

mapred-site.xml, I see 6 concurrent map tasks running. 

That  solves my problem !

Thanks! 

Shing 

----- Original Message -----
From: Harsh J <ha...@cloudera.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Cc: 
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task

Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.

On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo  distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I  submit a job to process a file of  size about 1.6 GB,  only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>

-- 
Harsh J

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Shing Hing Man <ma...@yahoo.com>.

I did restart TaskTracker after  setting 
mapred.tasktracker.map.tasks.maximum.

But I have been  using Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).

When I set mapred.tasktracker.map.tasks.maximum to 6 in 

mapred-site.xml, I see 6 concurrent map tasks running. 

That  solves my problem !

Thanks! 

Shing 

----- Original Message -----
From: Harsh J <ha...@cloudera.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Cc: 
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task

Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.

On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo  distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I  submit a job to process a file of  size about 1.6 GB,  only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>

-- 
Harsh J

Re: Pseudo distributed mode : How to increase no of concurrent map task

Posted by Shing Hing Man <ma...@yahoo.com>.

I did restart TaskTracker after  setting 
mapred.tasktracker.map.tasks.maximum.

But I have been  using Configuration.setInt("mapred.tasktracker.map.tasks.maximum",6).

When I set mapred.tasktracker.map.tasks.maximum to 6 in 

mapred-site.xml, I see 6 concurrent map tasks running. 

That  solves my problem !

Thanks! 

Shing 

----- Original Message -----
From: Harsh J <ha...@cloudera.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Cc: 
Sent: Saturday, September 29, 2012 8:23 PM
Subject: Re: Pseudo distributed mode : How to increase no of concurrent map task

Did you restart your TaskTrackers after increasing the
mapred.tasktracker.map.tasks.maximum value in mapred-site.xml? It is a
TaskTracker property, not a per-job one.

On Sun, Sep 30, 2012 at 12:36 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
> Hi,
> I am running Hadoop 1.03 in Pseudo  distributed mode, on a quad core Xeon processor with
> hyper-threading enabled.
> When I  submit a job to process a file of  size about 1.6 GB,  only two concurrent map tasks
> are running.
> I have set
> mapred.tasktracker.map.tasks.maximum to 6.
> In job.xml,
> mapred.map.tasks =25
> mapred.min.split.size =0
> dfs.block.size = 64MB
> How to increase the the number of concurrent map task ?
>
> Thanks in advance for any assistance !
>
> Shing
>

-- 
Harsh J