You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2011/03/18 20:54:40 UTC

mapred.min.split.size

Hi

What's the purpose of the parameter "mapred.min.split.size"?

Thanks,
-- 
Pedro

Re: mapred.min.split.size

Posted by Ted Yu <yu...@gmail.com>.
Cycling bits:
http://search-hadoop.com/m/O7sT4278lbG/but+it+seems+a+trade+off+with+the+number+of+files+that+have+to+be+shuffled+for+the&subj=RE+HDFS+block+size+v+s+mapred+min+split+size

On Fri, Mar 18, 2011 at 12:54 PM, Pedro Costa <ps...@gmail.com> wrote:

> Hi
>
> What's the purpose of the parameter "mapred.min.split.size"?
>
> Thanks,
> --
> Pedro
>

Re: mapred.min.split.size

Posted by Pedro Costa <ps...@gmail.com>.
As I understand, mapred.min.split.size defines the minimum size of a
split. In the case below:

(1) HDFS block size = 32MB, mapred.min.split.size=64MB
(mapred.min.split.size can be only set to larger than HDFS block size)

when I run mapreduce, it means that a map will run one input split of
64MB of size, but in reality, it contains 2 HDFS blocks. Is this
right?



On Fri, Mar 18, 2011 at 8:12 PM, Marcos Ortiz <ml...@uci.cu> wrote:
> El 3/18/2011 3:54 PM, Pedro Costa escribió:
>>
>> Hi
>>
>> What's the purpose of the parameter "mapred.min.split.size"?
>>
>> Thanks,
>>
>
> There are many parameters that control the number of map tasks for a Job,
> and mapred.min.split.size controls the minimun size of a split. Other
> parameters are:
> - mapreduce.map.tasks: The suggested number of map tasks
> - dfs.block.size: the file system block size in bytes of the input file
>
> Regards
>
> --
> Marcos Luís Ortíz Valmaseda
>  Software Engineer
>  Universidad de las Ciencias Informáticas
>  Linux User # 418229
>
> http://uncubanitolinuxero.blogspot.com
> http://www.linkedin.com/in/marcosluis2186
>
>



-- 
Pedro

Re: mapred.min.split.size

Posted by Marcos Ortiz <ml...@uci.cu>.
El 3/18/2011 3:54 PM, Pedro Costa escribió:
> Hi
>
> What's the purpose of the parameter "mapred.min.split.size"?
>
> Thanks,
>    
There are many parameters that control the number of map tasks for a 
Job, and mapred.min.split.size controls the minimun size of a split. 
Other parameters are:
- mapreduce.map.tasks: The suggested number of map tasks
- dfs.block.size: the file system block size in bytes of the input file

Regards

-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer
  Universidad de las Ciencias Informáticas
  Linux User # 418229

http://uncubanitolinuxero.blogspot.com
http://www.linkedin.com/in/marcosluis2186