You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2011/03/18 20:54:40 UTC
mapred.min.split.size
Hi
What's the purpose of the parameter "mapred.min.split.size"?
Thanks,
--
Pedro
Re: mapred.min.split.size
Posted by Ted Yu <yu...@gmail.com>.
Cycling bits:
http://search-hadoop.com/m/O7sT4278lbG/but+it+seems+a+trade+off+with+the+number+of+files+that+have+to+be+shuffled+for+the&subj=RE+HDFS+block+size+v+s+mapred+min+split+size
On Fri, Mar 18, 2011 at 12:54 PM, Pedro Costa <ps...@gmail.com> wrote:
> Hi
>
> What's the purpose of the parameter "mapred.min.split.size"?
>
> Thanks,
> --
> Pedro
>
Re: mapred.min.split.size
Posted by Pedro Costa <ps...@gmail.com>.
As I understand, mapred.min.split.size defines the minimum size of a
split. In the case below:
(1) HDFS block size = 32MB, mapred.min.split.size=64MB
(mapred.min.split.size can be only set to larger than HDFS block size)
when I run mapreduce, it means that a map will run one input split of
64MB of size, but in reality, it contains 2 HDFS blocks. Is this
right?
On Fri, Mar 18, 2011 at 8:12 PM, Marcos Ortiz <ml...@uci.cu> wrote:
> El 3/18/2011 3:54 PM, Pedro Costa escribió:
>>
>> Hi
>>
>> What's the purpose of the parameter "mapred.min.split.size"?
>>
>> Thanks,
>>
>
> There are many parameters that control the number of map tasks for a Job,
> and mapred.min.split.size controls the minimun size of a split. Other
> parameters are:
> - mapreduce.map.tasks: The suggested number of map tasks
> - dfs.block.size: the file system block size in bytes of the input file
>
> Regards
>
> --
> Marcos Luís Ortíz Valmaseda
> Software Engineer
> Universidad de las Ciencias Informáticas
> Linux User # 418229
>
> http://uncubanitolinuxero.blogspot.com
> http://www.linkedin.com/in/marcosluis2186
>
>
--
Pedro
Re: mapred.min.split.size
Posted by Marcos Ortiz <ml...@uci.cu>.
El 3/18/2011 3:54 PM, Pedro Costa escribió:
> Hi
>
> What's the purpose of the parameter "mapred.min.split.size"?
>
> Thanks,
>
There are many parameters that control the number of map tasks for a
Job, and mapred.min.split.size controls the minimun size of a split.
Other parameters are:
- mapreduce.map.tasks: The suggested number of map tasks
- dfs.block.size: the file system block size in bytes of the input file
Regards
--
Marcos Luís Ortíz Valmaseda
Software Engineer
Universidad de las Ciencias Informáticas
Linux User # 418229
http://uncubanitolinuxero.blogspot.com
http://www.linkedin.com/in/marcosluis2186