You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Shing Hing Man <ma...@yahoo.com> on 2012/10/02 18:34:36 UTC

How to lower the total number of map tasks



I am running Hadoop 1.0.3 in Pseudo  distributed mode. 
When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following 


mapred.map.tasks =242
mapred.min.split.size =0
dfs.block.size = 67108864


I would like to reduce   mapred.map.tasks to see if it improves performance. 
I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged. 
Is there a way to reduce  mapred.map.tasks  ? 


Thanks in advance for any assistance !  
Shing 


Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have followed a suggestion on the given link, and set  mapred.min.split.size to 134217728.

With the above mapred.min.split.size, I get    mapred.map.tasks =121 (previously it was 242). 

Thanks for all the replies ! 

Shing 





________________________________
 From: Romedius Weiss <Ro...@student.uibk.ac.at>
To: user@hadoop.apache.org 
Sent: Wednesday, October 3, 2012 5:00 AM
Subject: Re: How to lower the total number of map tasks
 
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the  mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size, so the algorithm tries to slice the file in chunks of a certian minimal size.

Have you tried to make a custom InputFormat? Might be another more drastic solution.

Cheers, R


Zitat von Shing Hing Man <ma...@yahoo.com>:

> I only have one big input file.
> 
> Shing
> 
> 
> ________________________________
>  From: Bejoy KS <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
> 
> 
> Hi Shing
> 
> Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> ________________________________
> 
> From:  Shing Hing Man <ma...@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<us...@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
> 
> 
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
> 
> and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).
> 
> But in the job.xml, I am still getting mapred.map.tasks =242 .
> 
> Shing
> 
> 
> 
> 
> 
> 
> ________________________________
>  From: Bejoy Ks <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
> 
> 
> Sorry for the typo, the property name is mapred.max.split.size
> 
> Also just for changing the number of map tasks you don't need to modify the hdfs block size.
> 
> 
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:
> 
> Hi
>> 
>> 
>> You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>> 
>> 
>> 
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>> 
>> 
>>> 
>>> 
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>> 
>>> 
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>> 
>>> 
>>> I would like to reduce   mapred.map.tasks to see if it improves performance.
>>> I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>> 
>>> 
>>> Thanks in advance for any assistance !  
>>> Shing
>>> 
>>> 
>> 

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have followed a suggestion on the given link, and set  mapred.min.split.size to 134217728.

With the above mapred.min.split.size, I get    mapred.map.tasks =121 (previously it was 242). 

Thanks for all the replies ! 

Shing 





________________________________
 From: Romedius Weiss <Ro...@student.uibk.ac.at>
To: user@hadoop.apache.org 
Sent: Wednesday, October 3, 2012 5:00 AM
Subject: Re: How to lower the total number of map tasks
 
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the  mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size, so the algorithm tries to slice the file in chunks of a certian minimal size.

Have you tried to make a custom InputFormat? Might be another more drastic solution.

Cheers, R


Zitat von Shing Hing Man <ma...@yahoo.com>:

> I only have one big input file.
> 
> Shing
> 
> 
> ________________________________
>  From: Bejoy KS <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
> 
> 
> Hi Shing
> 
> Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> ________________________________
> 
> From:  Shing Hing Man <ma...@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<us...@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
> 
> 
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
> 
> and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).
> 
> But in the job.xml, I am still getting mapred.map.tasks =242 .
> 
> Shing
> 
> 
> 
> 
> 
> 
> ________________________________
>  From: Bejoy Ks <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
> 
> 
> Sorry for the typo, the property name is mapred.max.split.size
> 
> Also just for changing the number of map tasks you don't need to modify the hdfs block size.
> 
> 
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:
> 
> Hi
>> 
>> 
>> You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>> 
>> 
>> 
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>> 
>> 
>>> 
>>> 
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>> 
>>> 
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>> 
>>> 
>>> I would like to reduce   mapred.map.tasks to see if it improves performance.
>>> I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>> 
>>> 
>>> Thanks in advance for any assistance !  
>>> Shing
>>> 
>>> 
>> 

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have followed a suggestion on the given link, and set  mapred.min.split.size to 134217728.

With the above mapred.min.split.size, I get    mapred.map.tasks =121 (previously it was 242). 

Thanks for all the replies ! 

Shing 





________________________________
 From: Romedius Weiss <Ro...@student.uibk.ac.at>
To: user@hadoop.apache.org 
Sent: Wednesday, October 3, 2012 5:00 AM
Subject: Re: How to lower the total number of map tasks
 
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the  mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size, so the algorithm tries to slice the file in chunks of a certian minimal size.

Have you tried to make a custom InputFormat? Might be another more drastic solution.

Cheers, R


Zitat von Shing Hing Man <ma...@yahoo.com>:

> I only have one big input file.
> 
> Shing
> 
> 
> ________________________________
>  From: Bejoy KS <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
> 
> 
> Hi Shing
> 
> Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> ________________________________
> 
> From:  Shing Hing Man <ma...@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<us...@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
> 
> 
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
> 
> and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).
> 
> But in the job.xml, I am still getting mapred.map.tasks =242 .
> 
> Shing
> 
> 
> 
> 
> 
> 
> ________________________________
>  From: Bejoy Ks <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
> 
> 
> Sorry for the typo, the property name is mapred.max.split.size
> 
> Also just for changing the number of map tasks you don't need to modify the hdfs block size.
> 
> 
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:
> 
> Hi
>> 
>> 
>> You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>> 
>> 
>> 
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>> 
>> 
>>> 
>>> 
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>> 
>>> 
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>> 
>>> 
>>> I would like to reduce   mapred.map.tasks to see if it improves performance.
>>> I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>> 
>>> 
>>> Thanks in advance for any assistance !  
>>> Shing
>>> 
>>> 
>> 

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have followed a suggestion on the given link, and set  mapred.min.split.size to 134217728.

With the above mapred.min.split.size, I get    mapred.map.tasks =121 (previously it was 242). 

Thanks for all the replies ! 

Shing 





________________________________
 From: Romedius Weiss <Ro...@student.uibk.ac.at>
To: user@hadoop.apache.org 
Sent: Wednesday, October 3, 2012 5:00 AM
Subject: Re: How to lower the total number of map tasks
 
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the  mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size, so the algorithm tries to slice the file in chunks of a certian minimal size.

Have you tried to make a custom InputFormat? Might be another more drastic solution.

Cheers, R


Zitat von Shing Hing Man <ma...@yahoo.com>:

> I only have one big input file.
> 
> Shing
> 
> 
> ________________________________
>  From: Bejoy KS <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
> 
> 
> Hi Shing
> 
> Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> ________________________________
> 
> From:  Shing Hing Man <ma...@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<us...@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
> 
> 
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
> 
> and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).
> 
> But in the job.xml, I am still getting mapred.map.tasks =242 .
> 
> Shing
> 
> 
> 
> 
> 
> 
> ________________________________
>  From: Bejoy Ks <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
> 
> 
> Sorry for the typo, the property name is mapred.max.split.size
> 
> Also just for changing the number of map tasks you don't need to modify the hdfs block size.
> 
> 
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:
> 
> Hi
>> 
>> 
>> You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>> 
>> 
>> 
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>> 
>> 
>>> 
>>> 
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>> 
>>> 
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>> 
>>> 
>>> I would like to reduce   mapred.map.tasks to see if it improves performance.
>>> I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>> 
>>> 
>>> Thanks in advance for any assistance !  
>>> Shing
>>> 
>>> 
>> 

Re: How to lower the total number of map tasks

Posted by Romedius Weiss <Ro...@student.uibk.ac.at>.
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the   
mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size,  
so the algorithm tries to slice the file in chunks of a certian  
minimal size.

Have you tried to make a custom InputFormat? Might be another more  
drastic solution.

Cheers, R


Zitat von Shing Hing Man <ma...@yahoo.com>:

> I only have one big input file.
>
> Shing
>
>
> ________________________________
>  From: Bejoy KS <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Hi Shing
>
> Is your input a single file or set of small files? If latter you  
> need to use CombineFileInputFormat.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ________________________________
>
> From:  Shing Hing Man <ma...@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<us...@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
>
>
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
>
> and setting mapred.max.split.size in mapred-site.xml. (  
> dfs.block.size is left unchanged at 67108864).
>
> But in the job.xml, I am still getting mapred.map.tasks =242 .
>
> Shing
>
>
>
>
>
>
> ________________________________
>  From: Bejoy Ks <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Sorry for the typo, the property name is mapred.max.split.size
>
> Also just for changing the number of map tasks you don't need to  
> modify the hdfs block size.
>
>
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:
>
> Hi
>>
>>
>> You need to alter the value of mapred.max.split size to a value  
>> larger than your block size to have less number of map tasks than  
>> the default.
>>
>>
>>
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>>
>>
>>>
>>>
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about  
>>> 16 GB, in job.xml, I have the following
>>>
>>>
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>>
>>>
>>> I would like to reduce   mapred.map.tasks to see if it improves  
>>> performance.
>>> I have tried doubling  the size of  dfs.block.size. But  
>>> the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>>
>>>
>>> Thanks in advance for any assistance !  
>>> Shing
>>>
>>>
>>



Re: How to lower the total number of map tasks

Posted by Romedius Weiss <Ro...@student.uibk.ac.at>.
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the   
mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size,  
so the algorithm tries to slice the file in chunks of a certian  
minimal size.

Have you tried to make a custom InputFormat? Might be another more  
drastic solution.

Cheers, R


Zitat von Shing Hing Man <ma...@yahoo.com>:

> I only have one big input file.
>
> Shing
>
>
> ________________________________
>  From: Bejoy KS <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Hi Shing
>
> Is your input a single file or set of small files? If latter you  
> need to use CombineFileInputFormat.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ________________________________
>
> From:  Shing Hing Man <ma...@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<us...@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
>
>
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
>
> and setting mapred.max.split.size in mapred-site.xml. (  
> dfs.block.size is left unchanged at 67108864).
>
> But in the job.xml, I am still getting mapred.map.tasks =242 .
>
> Shing
>
>
>
>
>
>
> ________________________________
>  From: Bejoy Ks <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Sorry for the typo, the property name is mapred.max.split.size
>
> Also just for changing the number of map tasks you don't need to  
> modify the hdfs block size.
>
>
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:
>
> Hi
>>
>>
>> You need to alter the value of mapred.max.split size to a value  
>> larger than your block size to have less number of map tasks than  
>> the default.
>>
>>
>>
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>>
>>
>>>
>>>
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about  
>>> 16 GB, in job.xml, I have the following
>>>
>>>
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>>
>>>
>>> I would like to reduce   mapred.map.tasks to see if it improves  
>>> performance.
>>> I have tried doubling  the size of  dfs.block.size. But  
>>> the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>>
>>>
>>> Thanks in advance for any assistance !  
>>> Shing
>>>
>>>
>>



Re: How to lower the total number of map tasks

Posted by Romedius Weiss <Ro...@student.uibk.ac.at>.
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the   
mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size,  
so the algorithm tries to slice the file in chunks of a certian  
minimal size.

Have you tried to make a custom InputFormat? Might be another more  
drastic solution.

Cheers, R


Zitat von Shing Hing Man <ma...@yahoo.com>:

> I only have one big input file.
>
> Shing
>
>
> ________________________________
>  From: Bejoy KS <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Hi Shing
>
> Is your input a single file or set of small files? If latter you  
> need to use CombineFileInputFormat.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ________________________________
>
> From:  Shing Hing Man <ma...@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<us...@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
>
>
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
>
> and setting mapred.max.split.size in mapred-site.xml. (  
> dfs.block.size is left unchanged at 67108864).
>
> But in the job.xml, I am still getting mapred.map.tasks =242 .
>
> Shing
>
>
>
>
>
>
> ________________________________
>  From: Bejoy Ks <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Sorry for the typo, the property name is mapred.max.split.size
>
> Also just for changing the number of map tasks you don't need to  
> modify the hdfs block size.
>
>
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:
>
> Hi
>>
>>
>> You need to alter the value of mapred.max.split size to a value  
>> larger than your block size to have less number of map tasks than  
>> the default.
>>
>>
>>
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>>
>>
>>>
>>>
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about  
>>> 16 GB, in job.xml, I have the following
>>>
>>>
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>>
>>>
>>> I would like to reduce   mapred.map.tasks to see if it improves  
>>> performance.
>>> I have tried doubling  the size of  dfs.block.size. But  
>>> the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>>
>>>
>>> Thanks in advance for any assistance !  
>>> Shing
>>>
>>>
>>



Re: How to lower the total number of map tasks

Posted by Romedius Weiss <Ro...@student.uibk.ac.at>.
Hi!

According to the article @YDN*
"The on-node parallelism is controlled by the   
mapred.tasktracker.map.tasks.maximum parameter."

[http://developer.yahoo.com/hadoop/tutorial/module4.html]

Also i think its better to set the min size instead of teh max size,  
so the algorithm tries to slice the file in chunks of a certian  
minimal size.

Have you tried to make a custom InputFormat? Might be another more  
drastic solution.

Cheers, R


Zitat von Shing Hing Man <ma...@yahoo.com>:

> I only have one big input file.
>
> Shing
>
>
> ________________________________
>  From: Bejoy KS <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:46 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Hi Shing
>
> Is your input a single file or set of small files? If latter you  
> need to use CombineFileInputFormat.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ________________________________
>
> From:  Shing Hing Man <ma...@yahoo.com>
> Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
> To: user@hadoop.apache.org<us...@hadoop.apache.org>
> ReplyTo:  user@hadoop.apache.org
> Subject: Re: How to lower the total number of map tasks
>
>
> I have tried
>        Configuration.setInt("mapred.max.split.size",134217728);
>
> and setting mapred.max.split.size in mapred-site.xml. (  
> dfs.block.size is left unchanged at 67108864).
>
> But in the job.xml, I am still getting mapred.map.tasks =242 .
>
> Shing
>
>
>
>
>
>
> ________________________________
>  From: Bejoy Ks <be...@gmail.com>
> To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com>
> Sent: Tuesday, October 2, 2012 6:03 PM
> Subject: Re: How to lower the total number of map tasks
>
>
> Sorry for the typo, the property name is mapred.max.split.size
>
> Also just for changing the number of map tasks you don't need to  
> modify the hdfs block size.
>
>
> On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:
>
> Hi
>>
>>
>> You need to alter the value of mapred.max.split size to a value  
>> larger than your block size to have less number of map tasks than  
>> the default.
>>
>>
>>
>> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>>
>>
>>>
>>>
>>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>> When I  submit a map/reduce job to process a file of  size about  
>>> 16 GB, in job.xml, I have the following
>>>
>>>
>>> mapred.map.tasks =242
>>> mapred.min.split.size =0
>>> dfs.block.size = 67108864
>>>
>>>
>>> I would like to reduce   mapred.map.tasks to see if it improves  
>>> performance.
>>> I have tried doubling  the size of  dfs.block.size. But  
>>> the    mapred.map.tasks remains unchanged.
>>> Is there a way to reduce  mapred.map.tasks  ?
>>>
>>>
>>> Thanks in advance for any assistance !  
>>> Shing
>>>
>>>
>>



Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I only have one big input file.

Shing 


________________________________
 From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:46 PM
Subject: Re: How to lower the total number of map tasks
 

Hi Shing

Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <ma...@yahoo.com> 
Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks


I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I only have one big input file.

Shing 


________________________________
 From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:46 PM
Subject: Re: How to lower the total number of map tasks
 

Hi Shing

Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <ma...@yahoo.com> 
Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks


I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I only have one big input file.

Shing 


________________________________
 From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:46 PM
Subject: Re: How to lower the total number of map tasks
 

Hi Shing

Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <ma...@yahoo.com> 
Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks


I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I only have one big input file.

Shing 


________________________________
 From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:46 PM
Subject: Re: How to lower the total number of map tasks
 

Hi Shing

Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <ma...@yahoo.com> 
Date: Tue, 2 Oct 2012 10:38:59 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks


I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy KS <be...@gmail.com>.
Hi Shing

Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:38:59 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks


I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy KS <be...@gmail.com>.
Hi Shing

Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:38:59 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks


I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy KS <be...@gmail.com>.
Hi Shing

Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:38:59 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks


I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy KS <be...@gmail.com>.
Hi Shing

Is your input a single file or set of small files? If latter you need to use CombineFileInputFormat.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:38:59 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks


I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have tried 
       Configuration.setInt("mapred.max.split.size",134217728);

and setting mapred.max.split.size in mapred-site.xml. ( dfs.block.size is left unchanged at 67108864).

But in the job.xml, I am still getting mapred.map.tasks =242 .

Shing 






________________________________
 From: Bejoy Ks <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:03 PM
Subject: Re: How to lower the total number of map tasks
 

Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the hdfs block size.


On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

Hi
>
>
>You need to alter the value of mapred.max.split size to a value larger than your block size to have less number of map tasks than the default.
>
>
>
>On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>
>>
>>
>>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>>
>>
>>mapred.map.tasks =242
>>mapred.min.split.size =0
>>dfs.block.size = 67108864
>>
>>
>>I would like to reduce   mapred.map.tasks to see if it improves performance.
>>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>>Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>>Thanks in advance for any assistance !  
>>Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy Ks <be...@gmail.com>.
Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the
hdfs block size.

On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

> Hi
>
> You need to alter the value of mapred.max.split size to a value larger
> than your block size to have less number of map tasks than the default.
>
>
> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>>
>>
>>
>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>> When I  submit a map/reduce job to process a file of  size about 16 GB,
>> in job.xml, I have the following
>>
>>
>> mapred.map.tasks =242
>> mapred.min.split.size =0
>> dfs.block.size = 67108864
>>
>>
>> I would like to reduce   mapred.map.tasks to see if it improves
>> performance.
>> I have tried doubling  the size of  dfs.block.size. But
>> the    mapred.map.tasks remains unchanged.
>> Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>> Thanks in advance for any assistance !
>> Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy Ks <be...@gmail.com>.
Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the
hdfs block size.

On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

> Hi
>
> You need to alter the value of mapred.max.split size to a value larger
> than your block size to have less number of map tasks than the default.
>
>
> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>>
>>
>>
>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>> When I  submit a map/reduce job to process a file of  size about 16 GB,
>> in job.xml, I have the following
>>
>>
>> mapred.map.tasks =242
>> mapred.min.split.size =0
>> dfs.block.size = 67108864
>>
>>
>> I would like to reduce   mapred.map.tasks to see if it improves
>> performance.
>> I have tried doubling  the size of  dfs.block.size. But
>> the    mapred.map.tasks remains unchanged.
>> Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>> Thanks in advance for any assistance !
>> Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy Ks <be...@gmail.com>.
Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the
hdfs block size.

On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

> Hi
>
> You need to alter the value of mapred.max.split size to a value larger
> than your block size to have less number of map tasks than the default.
>
>
> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>>
>>
>>
>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>> When I  submit a map/reduce job to process a file of  size about 16 GB,
>> in job.xml, I have the following
>>
>>
>> mapred.map.tasks =242
>> mapred.min.split.size =0
>> dfs.block.size = 67108864
>>
>>
>> I would like to reduce   mapred.map.tasks to see if it improves
>> performance.
>> I have tried doubling  the size of  dfs.block.size. But
>> the    mapred.map.tasks remains unchanged.
>> Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>> Thanks in advance for any assistance !
>> Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy Ks <be...@gmail.com>.
Sorry for the typo, the property name is mapred.max.split.size

Also just for changing the number of map tasks you don't need to modify the
hdfs block size.

On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks <be...@gmail.com> wrote:

> Hi
>
> You need to alter the value of mapred.max.split size to a value larger
> than your block size to have less number of map tasks than the default.
>
>
> On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:
>
>>
>>
>>
>> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>> When I  submit a map/reduce job to process a file of  size about 16 GB,
>> in job.xml, I have the following
>>
>>
>> mapred.map.tasks =242
>> mapred.min.split.size =0
>> dfs.block.size = 67108864
>>
>>
>> I would like to reduce   mapred.map.tasks to see if it improves
>> performance.
>> I have tried doubling  the size of  dfs.block.size. But
>> the    mapred.map.tasks remains unchanged.
>> Is there a way to reduce  mapred.map.tasks  ?
>>
>>
>> Thanks in advance for any assistance !
>> Shing
>>
>>
>

Re: How to lower the total number of map tasks

Posted by Bejoy Ks <be...@gmail.com>.
Hi

You need to alter the value of mapred.max.split size to a value larger than
your block size to have less number of map tasks than the default.

On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:

>
>
>
> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
> When I  submit a map/reduce job to process a file of  size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce   mapred.map.tasks to see if it improves
> performance.
> I have tried doubling  the size of  dfs.block.size. But
> the    mapred.map.tasks remains unchanged.
> Is there a way to reduce  mapred.map.tasks  ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have done the following.

1)  stop-all.sh
2)  In mapred-site.xml,  added
<property>
  <name>mapred.max.split.size</name>
  <value>134217728</value>
</property>

  

(df.block.size remain unchanged at  67108864)

3) start-all.sh 


4) Use hadoop fs -cp src destn,  to copy  my original file to  another hdfs directory.

5) Run my mapReduce program using the  new  copy of input file . 

 
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.


I have also tried deleting my input file  in hdfs and import it again from my local drive. 

Any more ideas ?

Shing 




________________________________
 From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
 

Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <ma...@yahoo.com> 
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have done the following.

1)  stop-all.sh
2)  In mapred-site.xml,  added
<property>
  <name>mapred.max.split.size</name>
  <value>134217728</value>
</property>

  

(df.block.size remain unchanged at  67108864)

3) start-all.sh 


4) Use hadoop fs -cp src destn,  to copy  my original file to  another hdfs directory.

5) Run my mapReduce program using the  new  copy of input file . 

 
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.


I have also tried deleting my input file  in hdfs and import it again from my local drive. 

Any more ideas ?

Shing 




________________________________
 From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
 

Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <ma...@yahoo.com> 
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have done the following.

1)  stop-all.sh
2)  In mapred-site.xml,  added
<property>
  <name>mapred.max.split.size</name>
  <value>134217728</value>
</property>

  

(df.block.size remain unchanged at  67108864)

3) start-all.sh 


4) Use hadoop fs -cp src destn,  to copy  my original file to  another hdfs directory.

5) Run my mapReduce program using the  new  copy of input file . 

 
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.


I have also tried deleting my input file  in hdfs and import it again from my local drive. 

Any more ideas ?

Shing 




________________________________
 From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
 

Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <ma...@yahoo.com> 
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.
I have done the following.

1)  stop-all.sh
2)  In mapred-site.xml,  added
<property>
  <name>mapred.max.split.size</name>
  <value>134217728</value>
</property>

  

(df.block.size remain unchanged at  67108864)

3) start-all.sh 


4) Use hadoop fs -cp src destn,  to copy  my original file to  another hdfs directory.

5) Run my mapReduce program using the  new  copy of input file . 

 
However, in the job.xml, I still get mapred.map.tasks =242, which is same as before.


I have also tried deleting my input file  in hdfs and import it again from my local drive. 

Any more ideas ?

Shing 




________________________________
 From: Bejoy KS <be...@gmail.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:37 PM
Subject: Re: How to lower the total number of map tasks
 

Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________

From:  Shing Hing Man <ma...@yahoo.com> 
Date: Tue, 2 Oct 2012 10:33:45 -0700 (PDT)
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo:  user@hadoop.apache.org 
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Bejoy KS <be...@gmail.com>.
Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Bejoy KS <be...@gmail.com>.
Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Bejoy KS <be...@gmail.com>.
Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Bejoy KS <be...@gmail.com>.
Shing

This doesn't change the block size of existing files in hdfs, only new files written to hdfs will be affected. To get this in effect for old files you need to re copy them atleast within hdfs.
hadoop fs -cp src destn.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Shing Hing Man <ma...@yahoo.com>
Date: Tue, 2 Oct 2012 10:33:45 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Re: How to lower the total number of map tasks



 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.

 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.

 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.

 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Shing Hing Man <ma...@yahoo.com>.

 I set the block size using 
  Configuration.setInt("dfs.block.size",134217728);


I have also set it  in mapred-site.xml.

Shing 



________________________________
 From: Chris Nauroth <cn...@hortonworks.com>
To: user@hadoop.apache.org; Shing Hing Man <ma...@yahoo.com> 
Sent: Tuesday, October 2, 2012 6:00 PM
Subject: Re: How to lower the total number of map tasks
 

Those numbers make sense, considering 1 map task per block.  16 GB file / 64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically, the block size is selected at file write time, with a default value from system configuration used if not specified.  Did you "hadoop fs -put" the file with the new block size, or was it something else?

Thank you,
--Chris


On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:


>
>
>I am running Hadoop 1.0.3 in Pseudo  distributed mode.
>When I  submit a map/reduce job to process a file of  size about 16 GB, in job.xml, I have the following
>
>
>mapred.map.tasks =242
>mapred.min.split.size =0
>dfs.block.size = 67108864
>
>
>I would like to reduce   mapred.map.tasks to see if it improves performance.
>I have tried doubling  the size of  dfs.block.size. But the    mapred.map.tasks remains unchanged.
>Is there a way to reduce  mapred.map.tasks  ?
>
>
>Thanks in advance for any assistance !  
>Shing
>
>

Re: How to lower the total number of map tasks

Posted by Chris Nauroth <cn...@hortonworks.com>.
Those numbers make sense, considering 1 map task per block.  16 GB file /
64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically,
the block size is selected at file write time, with a default value from
system configuration used if not specified.  Did you "hadoop fs -put" the
file with the new block size, or was it something else?

Thank you,
--Chris

On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:

>
>
>
> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
> When I  submit a map/reduce job to process a file of  size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce   mapred.map.tasks to see if it improves
> performance.
> I have tried doubling  the size of  dfs.block.size. But
> the    mapred.map.tasks remains unchanged.
> Is there a way to reduce  mapred.map.tasks  ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>

Re: How to lower the total number of map tasks

Posted by Chris Nauroth <cn...@hortonworks.com>.
Those numbers make sense, considering 1 map task per block.  16 GB file /
64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically,
the block size is selected at file write time, with a default value from
system configuration used if not specified.  Did you "hadoop fs -put" the
file with the new block size, or was it something else?

Thank you,
--Chris

On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:

>
>
>
> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
> When I  submit a map/reduce job to process a file of  size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce   mapred.map.tasks to see if it improves
> performance.
> I have tried doubling  the size of  dfs.block.size. But
> the    mapred.map.tasks remains unchanged.
> Is there a way to reduce  mapred.map.tasks  ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>

Re: How to lower the total number of map tasks

Posted by Chris Nauroth <cn...@hortonworks.com>.
Those numbers make sense, considering 1 map task per block.  16 GB file /
64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically,
the block size is selected at file write time, with a default value from
system configuration used if not specified.  Did you "hadoop fs -put" the
file with the new block size, or was it something else?

Thank you,
--Chris

On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:

>
>
>
> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
> When I  submit a map/reduce job to process a file of  size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce   mapred.map.tasks to see if it improves
> performance.
> I have tried doubling  the size of  dfs.block.size. But
> the    mapred.map.tasks remains unchanged.
> Is there a way to reduce  mapred.map.tasks  ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>

Re: How to lower the total number of map tasks

Posted by Chris Nauroth <cn...@hortonworks.com>.
Those numbers make sense, considering 1 map task per block.  16 GB file /
64 MB block size = ~242 map tasks.

When you doubled dfs.block.size, how did you accomplish that?  Typically,
the block size is selected at file write time, with a default value from
system configuration used if not specified.  Did you "hadoop fs -put" the
file with the new block size, or was it something else?

Thank you,
--Chris

On Tue, Oct 2, 2012 at 9:34 AM, Shing Hing Man <ma...@yahoo.com> wrote:

>
>
>
> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
> When I  submit a map/reduce job to process a file of  size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce   mapred.map.tasks to see if it improves
> performance.
> I have tried doubling  the size of  dfs.block.size. But
> the    mapred.map.tasks remains unchanged.
> Is there a way to reduce  mapred.map.tasks  ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>

Re: How to lower the total number of map tasks

Posted by Bejoy Ks <be...@gmail.com>.
Hi

You need to alter the value of mapred.max.split size to a value larger than
your block size to have less number of map tasks than the default.

On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:

>
>
>
> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
> When I  submit a map/reduce job to process a file of  size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce   mapred.map.tasks to see if it improves
> performance.
> I have tried doubling  the size of  dfs.block.size. But
> the    mapred.map.tasks remains unchanged.
> Is there a way to reduce  mapred.map.tasks  ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>

Re: How to lower the total number of map tasks

Posted by Bejoy Ks <be...@gmail.com>.
Hi

You need to alter the value of mapred.max.split size to a value larger than
your block size to have less number of map tasks than the default.

On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:

>
>
>
> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
> When I  submit a map/reduce job to process a file of  size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce   mapred.map.tasks to see if it improves
> performance.
> I have tried doubling  the size of  dfs.block.size. But
> the    mapred.map.tasks remains unchanged.
> Is there a way to reduce  mapred.map.tasks  ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>

Re: How to lower the total number of map tasks

Posted by Bejoy Ks <be...@gmail.com>.
Hi

You need to alter the value of mapred.max.split size to a value larger than
your block size to have less number of map tasks than the default.

On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man <ma...@yahoo.com> wrote:

>
>
>
> I am running Hadoop 1.0.3 in Pseudo  distributed mode.
> When I  submit a map/reduce job to process a file of  size about 16 GB, in
> job.xml, I have the following
>
>
> mapred.map.tasks =242
> mapred.min.split.size =0
> dfs.block.size = 67108864
>
>
> I would like to reduce   mapred.map.tasks to see if it improves
> performance.
> I have tried doubling  the size of  dfs.block.size. But
> the    mapred.map.tasks remains unchanged.
> Is there a way to reduce  mapred.map.tasks  ?
>
>
> Thanks in advance for any assistance !
> Shing
>
>