You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Manoj Babu <ma...@gmail.com> on 2012/07/11 14:29:05 UTC

Mapper basic question

Hi,

The no of mappers is depends on the no of blocks. Is it possible to limit
the no of mappers size without increasing the HDFS block size?

Thanks in advance.

Cheers!
Manoj.

Re: Mapper basic question

Posted by Manoj Babu <ma...@gmail.com>.
Thanks All!
 On 11 Jul 2012 19:07, "Bejoy KS" <be...@gmail.com> wrote:

> **
> Hi Manoj
>
> Block size is in hdfs storage level where as split size is the amount of
> data processed by each mapper while running a map reduce job(One split is
> the data processed by one mapper). One or more hdfs blocks can contribute a
> split. Splits are determined by the InputFormat as well as the min and max
> split size properties.
>
> As Arun mentioned use CombineFileInputFormat and adjust the min and max
> split size properties to control/limit the number of mappers.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Manoj Babu <ma...@gmail.com>
> *Date: *Wed, 11 Jul 2012 18:17:41 +0530
> *To: *<ma...@hadoop.apache.org>
> *ReplyTo: * mapreduce-user@hadoop.apache.org
> *Subject: *Re: Mapper basic question
>
> Hi  Tariq \Arun,
>
> The no of blocks(splits) = *total no of file size/hdfs block size *
> replicate value*
> The no of splits is again nothing but the blocks here.
>
> Other than increasing the block size(input splits) is it possible to limit
> that no of mappers?
>
>
> Cheers!
> Manoj.
>
>
>
> On Wed, Jul 11, 2012 at 6:06 PM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> Take a look at CombineFileInputFormat - this will create 'meta splits'
>> which include multiple small spilts, thus reducing #maps which are run.
>>
>> Arun
>>
>> On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote:
>>
>> Hi,
>>
>> The no of mappers is depends on the no of blocks. Is it possible to limit
>> the no of mappers size without increasing the HDFS block size?
>>
>> Thanks in advance.
>>
>> Cheers!
>> Manoj.
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: Mapper basic question

Posted by Bejoy KS <be...@gmail.com>.
Hi Manoj

Block size is in hdfs storage level where as split size is the amount of data processed by each mapper while running a map reduce job(One split is the data processed by one mapper). One or more hdfs blocks can contribute a split. Splits are determined by the InputFormat as well as the min and max split size properties.

As Arun mentioned use CombineFileInputFormat and adjust the min and max split size properties to control/limit the number of mappers. 


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Manoj Babu <ma...@gmail.com>
Date: Wed, 11 Jul 2012 18:17:41 
To: <ma...@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re: Mapper basic question

Hi  Tariq \Arun,

The no of blocks(splits) = *total no of file size/hdfs block size *
replicate value*
The no of splits is again nothing but the blocks here.

Other than increasing the block size(input splits) is it possible to limit
that no of mappers?


Cheers!
Manoj.



On Wed, Jul 11, 2012 at 6:06 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Take a look at CombineFileInputFormat - this will create 'meta splits'
> which include multiple small spilts, thus reducing #maps which are run.
>
> Arun
>
> On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote:
>
> Hi,
>
> The no of mappers is depends on the no of blocks. Is it possible to limit
> the no of mappers size without increasing the HDFS block size?
>
> Thanks in advance.
>
> Cheers!
> Manoj.
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


Re: Mapper basic question

Posted by Manoj Babu <ma...@gmail.com>.
Hi  Tariq \Arun,

The no of blocks(splits) = *total no of file size/hdfs block size *
replicate value*
The no of splits is again nothing but the blocks here.

Other than increasing the block size(input splits) is it possible to limit
that no of mappers?


Cheers!
Manoj.



On Wed, Jul 11, 2012 at 6:06 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Take a look at CombineFileInputFormat - this will create 'meta splits'
> which include multiple small spilts, thus reducing #maps which are run.
>
> Arun
>
> On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote:
>
> Hi,
>
> The no of mappers is depends on the no of blocks. Is it possible to limit
> the no of mappers size without increasing the HDFS block size?
>
> Thanks in advance.
>
> Cheers!
> Manoj.
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: Mapper basic question

Posted by Arun C Murthy <ac...@hortonworks.com>.
Take a look at CombineFileInputFormat - this will create 'meta splits' which include multiple small spilts, thus reducing #maps which are run.

Arun

On Jul 11, 2012, at 5:29 AM, Manoj Babu wrote:

> Hi,
> 
> The no of mappers is depends on the no of blocks. Is it possible to limit the no of mappers size without increasing the HDFS block size?
> 
> Thanks in advance.
> 
> Cheers!
> Manoj.
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Mapper basic question

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Manoj,

     It is not the block that determines the no of mappers. It is
rather based on the no of input splits. No of mappers = no of input
splits.
And I did not get what do you mean by 'no of mapper size'. It is
possible to configure the input splits though. Hope it helps.

Regards,
    Mohammad Tariq


On Wed, Jul 11, 2012 at 5:59 PM, Manoj Babu <ma...@gmail.com> wrote:
> Hi,
>
> The no of mappers is depends on the no of blocks. Is it possible to limit
> the no of mappers size without increasing the HDFS block size?
>
> Thanks in advance.
>
> Cheers!
> Manoj.
>