You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Abhishek <ab...@gmail.com> on 2012/08/28 05:02:38 UTC

Number of reducers

Hi all,

I just want to know that, based on what factor map reduce framework decides number of reducers to launch for a job

By default only one reducer will be launched for a given job is this right? If we explicitly does not mention number to launch via command line or driver class.

If i choose to decide number of reducers to mention explicitly, what should I consider.Because choosing in appropriate number of reducer hampers the performance.

Sorry for this question, am little confused on this number of reducers.

Regards
Abhishek






Sent from my iPhone

Re: Number of reducers

Posted by Harsh J <ha...@cloudera.com>.
Ah, well, my bad. See instead the description for mapred.reduce.tasks
in mapred-default.xml, which states this: "Typically set to 99% of the
cluster's reduce capacity, so that if a node fails the reduces can
still be executed in a single wave."

FWIW, I set it manually to the level of parallelism I require (given
my partitioned data, etc.).

On Tue, Aug 28, 2012 at 8:43 PM, abhiTowson cal
<ab...@gmail.com> wrote:
> hi harsh,
>
> Thanks for the reply.I get your first and second points and coming to
> third point how is it specific to a job?
> My question was specific to job.
>
> Regards
> Abhishek
>
>
>
> On Mon, Aug 27, 2012 at 11:29 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi,
>>
>> On Tue, Aug 28, 2012 at 8:32 AM, Abhishek <ab...@gmail.com> wrote:
>>> Hi all,
>>>
>>> I just want to know that, based on what factor map reduce framework decides number of reducers to launch for a job
>>
>> The framework does not auto-determine the number of reducers for a
>> job. That is purely user-or-client-program-supplied presently.
>>
>>> By default only one reducer will be launched for a given job is this right? If we explicitly does not mention number to launch via command line or driver class.
>>
>> Yes, by default the number of reduce tasks is configured to be one.
>>
>>> If i choose to decide number of reducers to mention explicitly, what should I consider.Because choosing in appropriate number of reducer hampers the performance.
>>
>> See http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>
>> --
>> Harsh J



-- 
Harsh J

Re: Number of reducers

Posted by abhiTowson cal <ab...@gmail.com>.
hi harsh,

Thanks for the reply.I get your first and second points and coming to
third point how is it specific to a job?
My question was specific to job.

Regards
Abhishek



On Mon, Aug 27, 2012 at 11:29 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi,
>
> On Tue, Aug 28, 2012 at 8:32 AM, Abhishek <ab...@gmail.com> wrote:
>> Hi all,
>>
>> I just want to know that, based on what factor map reduce framework decides number of reducers to launch for a job
>
> The framework does not auto-determine the number of reducers for a
> job. That is purely user-or-client-program-supplied presently.
>
>> By default only one reducer will be launched for a given job is this right? If we explicitly does not mention number to launch via command line or driver class.
>
> Yes, by default the number of reduce tasks is configured to be one.
>
>> If i choose to decide number of reducers to mention explicitly, what should I consider.Because choosing in appropriate number of reducer hampers the performance.
>
> See http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> --
> Harsh J

Re: Number of reducers

Posted by Harsh J <ha...@cloudera.com>.
Hi,

On Tue, Aug 28, 2012 at 8:32 AM, Abhishek <ab...@gmail.com> wrote:
> Hi all,
>
> I just want to know that, based on what factor map reduce framework decides number of reducers to launch for a job

The framework does not auto-determine the number of reducers for a
job. That is purely user-or-client-program-supplied presently.

> By default only one reducer will be launched for a given job is this right? If we explicitly does not mention number to launch via command line or driver class.

Yes, by default the number of reduce tasks is configured to be one.

> If i choose to decide number of reducers to mention explicitly, what should I consider.Because choosing in appropriate number of reducer hampers the performance.

See http://wiki.apache.org/hadoop/HowManyMapsAndReduces

-- 
Harsh J