You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Chris Nauroth <cn...@hortonworks.com> on 2012/10/02 19:00:26 UTC
Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file /
64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically,
the block size is selected at file write time, with a default value from
system configuration used if not specified. Did you "hadoop fs -put" the
file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>
> I am running Hadoop 1.0.3 in Pseudo distributed mode.
> When I submit a map/reduce job to process a file of size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce mapred.map.tasks to see if it improves
> performance.
> I have tried doubling the size of dfs.block.size. But
> the mapred.map.tasks remains unchanged.
> Is there a way to reduce mapred.map.tasks ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>
Re: How to lower the total number of map tasks
Posted by Shing Hing Man <ma...@yahoo.com>.
I have done the following.
1) stop-all.sh
2) In mapred-site.xml, added
<property>
<name>mapred.max.split.size</name>
<value>134217728</value>
</property>
(df.block.size remain unchanged at 67108864)
3) start-all.sh
4) Use hadoop fs -cp src destn, to copy my original file to another hdfs directory.
5) Run my mapReduce program using the new copy of input file .
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.
I have also tried deleting my input file in hdfs and import it again from my local drive.
Any more ideas ?
Shing
________________________________
From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
________________________________
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Shing Hing Man <ma...@yahoo.com>.
I have done the following.
1) stop-all.sh
2) In mapred-site.xml, added
<property>
<name>mapred.max.split.size</name>
<value>134217728</value>
</property>
(df.block.size remain unchanged at 67108864)
3) start-all.sh
4) Use hadoop fs -cp src destn, to copy my original file to another hdfs directory.
5) Run my mapReduce program using the new copy of input file .
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.
I have also tried deleting my input file in hdfs and import it again from my local drive.
Any more ideas ?
Shing
________________________________
From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
________________________________
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Shing Hing Man <ma...@yahoo.com>.
I have done the following.
1) stop-all.sh
2) In mapred-site.xml, added
<property>
<name>mapred.max.split.size</name>
<value>134217728</value>
</property>
(df.block.size remain unchanged at 67108864)
3) start-all.sh
4) Use hadoop fs -cp src destn, to copy my original file to another hdfs directory.
5) Run my mapReduce program using the new copy of input file .
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.
I have also tried deleting my input file in hdfs and import it again from my local drive.
Any more ideas ?
Shing
________________________________
From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
________________________________
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Shing Hing Man <ma...@yahoo.com>.
I have done the following.
1) stop-all.sh
2) In mapred-site.xml, added
<property>
<name>mapred.max.split.size</name>
<value>134217728</value>
</property>
(df.block.size remain unchanged at 67108864)
3) start-all.sh
4) Use hadoop fs -cp src destn, to copy my original file to another hdfs directory.
5) Run my mapReduce program using the new copy of input file .
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.
I have also tried deleting my input file in hdfs and import it again from my local drive.
Any more ideas ?
Shing
________________________________
From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
________________________________
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Bejoy KS <be...@gmail.com>.
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Bejoy KS <be...@gmail.com>.
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Bejoy KS <be...@gmail.com>.
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Bejoy KS <be...@gmail.com>.
Shing
This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Shing Hing Man <ma...@yahoo.com>.
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Shing Hing Man <ma...@yahoo.com>.
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Shing Hing Man <ma...@yahoo.com>.
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>
Re: How to lower the total number of map tasks
Posted by Shing Hing Man <ma...@yahoo.com>.
I set the block size using
Configuration.setInt("dfs.block.size",134217728);
I have also set it in mapred-site.xml.
Shing
________________________________
From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
Those numbers make sense, considering 1 map task per block. 16 GB file / 64 MB block size = ~242 map tasks.
When you doubled dfs.block.size, how did you accomplish that? Typically, the block size is selected at file write time, with a default value from system configuration used if not specified. Did you "hadoop fs -put" the file with the new block size, or was it something else?
Thank you,
--Chris
On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>I am running Hadoop 1.0.3 in Pseudo distributed mode.
>When I submit a map/reduce job to process a file of size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce mapred.map.tasks to see if it improves performance.
>I have tried doubling the size of dfs.block.size. But the mapred.map.tasks remains unchanged.
>Is there a way to reduce mapred.map.tasks ?
>
>
>Thanks in advance for any assistance !
>Shing
>
>