You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Abhishek <ab...@gmail.com> on 2012/09/26 14:28:40 UTC

Hive configuration property

Hi all,

I have doubt regarding below properties, is it a good practice to override below properties in hive.

If yes, what is the optimal values for the following properties?

  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>

Regards
Abhi

Sent from my iPhone

Re: Hive configuration property

Posted by Abhishek <ab...@gmail.com>.
Hi ashok,

Thank you very much.

Regards
Abhi

Sent from my iPhone

On Sep 26, 2012, at 1:32 PM, <as...@wipro.com> wrote:

> Hello Abhi,
> Hope below information ll help you.
> mapred.reduce.tasks
> Default Value: -1
> Added In: 0.1
> The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
> hive.exec.reducers.bytes.per.reducer
> Default Value: 1000000000
> Added In:
> Size per reducer. The default is 1G, i.e if the input size is 10G, it will use 10 reducers.
> hive.exec.reducers.max
> Default Value: 999
> Added In:
> Max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is negative, hive will use this one as the max number of reducers when automatically determine number of reducers.
>  
> Thanks
> Ashok S.
>  
> From: Abhishek [mailto:abhishek.dodda1@gmail.com] 
> Sent: 26 September 2012 22:27
> To: user@hive.apache.org
> Cc: <us...@hive.apache.org>; <be...@yahoo.com>
> Subject: Re: Hive configuration property
>  
>  Hi Ashok,
>  
> Thanks for the reply, can you please tell me how many reducers should be considered using for 1 GB of intermediate data.
>  
> Regards
> Abhi
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 12:39 PM, <as...@wipro.com> wrote:
> 
> Yes Abshiek,
> By setting below prop. You ll get better result. The no should depends on ur data size.
>  
> Regards
> Ashok S.
>  
> From: Bejoy KS [mailto:bejoy_ks@yahoo.com] 
> Sent: 26 September 2012 21:04
> To: user@hive.apache.org
> Subject: Re: Hive configuration property
>  
> Hi Abshiek
>  
> Based on my experience you can always provide the number of reduce tasks (mapred.reduce.tasks) based on the data volume your query handles. It can yield you better performance numbers. 
>  
> Regards,
> Bejoy KS
>  
> From: Abhishek <ab...@gmail.com>
> To: "user@hive.apache.org" <us...@hive.apache.org> 
> Cc: "user@hive.apache.org" <us...@hive.apache.org> 
> Sent: Wednesday, September 26, 2012 7:04 PM
> Subject: Re: Hive configuration property
> 
> 
> 
> Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.
>  
> Regards
> Abhi
>  
> 
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bh...@gmail.com> wrote:
> 
>  
> I'm no expert in hive, but here are my 2 cents. 
>  
> By default hive schedules a reducer per every 1 GB of data ( change that value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is huge, there will be large number of reducers, which might be unnecessary.( Sometimes large number of reducers slows down the job because their number exceeds total task slots and they keep waiting for their turn. Not to forget, the initialization overheads for each task..jvm etc.).
>  
> Overall, I think there cannot be any optimum values for a cluster. It depends on the type of queries, size of your inputs, size of map outputs in the jobs (intermediate outputs ). So you can can check various values and see which one is the best. From my experience setting "hive.exec.reducers.max" to total number of reduce slots in your cluster gives you a decent performance since all the reducers are completed in a single wave. (This may or maynot work for you, worth giving a try).
>  
>  
> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <ab...@gmail.com> wrote:
>  
> Hi all,
>  
> I have doubt regarding below properties, is it a good practice to override below properties in hive.
>  
> If yes, what is the optimal values for the following properties?
> 
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
>  
> Regards
> Abhi
> 
> Sent from my iPhone
> 
> 
>  
> -- 
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v
>  
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

RE: Hive configuration property

Posted by as...@wipro.com.
Hello Abhi,
Hope below information ll help you.
mapred.reduce.tasks

  *   Default Value: -1
  *   Added In: 0.1
The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
hive.exec.reducers.bytes.per.reducer

  *   Default Value: 1000000000
  *   Added In:
Size per reducer. The default is 1G, i.e if the input size is 10G, it will use 10 reducers.
hive.exec.reducers.max

  *   Default Value: 999
  *   Added In:
Max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is negative, hive will use this one as the max number of reducers when automatically determine number of reducers.

Thanks
Ashok S.

From: Abhishek [mailto:abhishek.dodda1@gmail.com]
Sent: 26 September 2012 22:27
To: user@hive.apache.org
Cc: <us...@hive.apache.org>; <be...@yahoo.com>
Subject: Re: Hive configuration property

 Hi Ashok,

Thanks for the reply, can you please tell me how many reducers should be considered using for 1 GB of intermediate data.

Regards
Abhi

Sent from my iPhone

On Sep 26, 2012, at 12:39 PM, <as...@wipro.com>> wrote:
Yes Abshiek,
By setting below prop. You ll get better result. The no should depends on ur data size.

Regards
Ashok S.

From: Bejoy KS [mailto:bejoy_ks@yahoo.com]
Sent: 26 September 2012 21:04
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Hive configuration property

Hi Abshiek

Based on my experience you can always provide the number of reduce tasks (mapred.reduce.tasks) based on the data volume your query handles. It can yield you better performance numbers.

Regards,
Bejoy KS

________________________________
From: Abhishek <ab...@gmail.com>>
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Cc: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Sent: Wednesday, September 26, 2012 7:04 PM
Subject: Re: Hive configuration property



Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.

Regards
Abhi



Sent from my iPhone

On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bh...@gmail.com>> wrote:

I'm no expert in hive, but here are my 2 cents.

By default hive schedules a reducer per every 1 GB of data ( change that value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is huge, there will be large number of reducers, which might be unnecessary.( Sometimes large number of reducers slows down the job because their number exceeds total task slots and they keep waiting for their turn. Not to forget, the initialization overheads for each task..jvm etc.).

Overall, I think there cannot be any optimum values for a cluster. It depends on the type of queries, size of your inputs, size of map outputs in the jobs (intermediate outputs ). So you can can check various values and see which one is the best. From my experience setting "hive.exec.reducers.max" to total number of reduce slots in your cluster gives you a decent performance since all the reducers are completed in a single wave. (This may or maynot work for you, worth giving a try).


On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <ab...@gmail.com>> wrote:

Hi all,

I have doubt regarding below properties, is it a good practice to override below properties in hive.

If yes, what is the optimal values for the following properties?

  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>

Regards
Abhi

Sent from my iPhone



--
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: Hive configuration property

Posted by Abhishek <ab...@gmail.com>.
 Hi Ashok,

Thanks for the reply, can you please tell me how many reducers should be considered using for 1 GB of intermediate data.

Regards
Abhi

Sent from my iPhone

On Sep 26, 2012, at 12:39 PM, <as...@wipro.com> wrote:

> Yes Abshiek,
> By setting below prop. You ll get better result. The no should depends on ur data size.
>  
> Regards
> Ashok S.
>  
> From: Bejoy KS [mailto:bejoy_ks@yahoo.com] 
> Sent: 26 September 2012 21:04
> To: user@hive.apache.org
> Subject: Re: Hive configuration property
>  
> Hi Abshiek
>  
> Based on my experience you can always provide the number of reduce tasks (mapred.reduce.tasks) based on the data volume your query handles. It can yield you better performance numbers. 
>  
> Regards,
> Bejoy KS
>  
> From: Abhishek <ab...@gmail.com>
> To: "user@hive.apache.org" <us...@hive.apache.org> 
> Cc: "user@hive.apache.org" <us...@hive.apache.org> 
> Sent: Wednesday, September 26, 2012 7:04 PM
> Subject: Re: Hive configuration property
> 
> 
> Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.
>  
> Regards
> Abhi
>  
> 
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bh...@gmail.com> wrote:
> 
>  
> I'm no expert in hive, but here are my 2 cents. 
>  
> By default hive schedules a reducer per every 1 GB of data ( change that value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is huge, there will be large number of reducers, which might be unnecessary.( Sometimes large number of reducers slows down the job because their number exceeds total task slots and they keep waiting for their turn. Not to forget, the initialization overheads for each task..jvm etc.).
>  
> Overall, I think there cannot be any optimum values for a cluster. It depends on the type of queries, size of your inputs, size of map outputs in the jobs (intermediate outputs ). So you can can check various values and see which one is the best. From my experience setting "hive.exec.reducers.max" to total number of reduce slots in your cluster gives you a decent performance since all the reducers are completed in a single wave. (This may or maynot work for you, worth giving a try).
>  
>  
> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <ab...@gmail.com> wrote:
>  
> Hi all,
>  
> I have doubt regarding below properties, is it a good practice to override below properties in hive.
>  
> If yes, what is the optimal values for the following properties?
> 
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
>  
> Regards
> Abhi
> 
> Sent from my iPhone
> 
> 
>  
> -- 
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v
>  
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

RE: Hive configuration property

Posted by as...@wipro.com.
Yes Abshiek,
By setting below prop. You ll get better result. The no should depends on ur data size.

Regards
Ashok S.

From: Bejoy KS [mailto:bejoy_ks@yahoo.com]
Sent: 26 September 2012 21:04
To: user@hive.apache.org
Subject: Re: Hive configuration property

Hi Abshiek

Based on my experience you can always provide the number of reduce tasks (mapred.reduce.tasks) based on the data volume your query handles. It can yield you better performance numbers.

Regards,
Bejoy KS

________________________________
From: Abhishek <ab...@gmail.com>
To: "user@hive.apache.org" <us...@hive.apache.org>
Cc: "user@hive.apache.org" <us...@hive.apache.org>
Sent: Wednesday, September 26, 2012 7:04 PM
Subject: Re: Hive configuration property


Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.

Regards
Abhi



Sent from my iPhone

On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bh...@gmail.com>> wrote:

I'm no expert in hive, but here are my 2 cents.

By default hive schedules a reducer per every 1 GB of data ( change that value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is huge, there will be large number of reducers, which might be unnecessary.( Sometimes large number of reducers slows down the job because their number exceeds total task slots and they keep waiting for their turn. Not to forget, the initialization overheads for each task..jvm etc.).

Overall, I think there cannot be any optimum values for a cluster. It depends on the type of queries, size of your inputs, size of map outputs in the jobs (intermediate outputs ). So you can can check various values and see which one is the best. From my experience setting "hive.exec.reducers.max" to total number of reduce slots in your cluster gives you a decent performance since all the reducers are completed in a single wave. (This may or maynot work for you, worth giving a try).


On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <ab...@gmail.com>> wrote:

Hi all,

I have doubt regarding below properties, is it a good practice to override below properties in hive.

If yes, what is the optimal values for the following properties?

  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>

Regards
Abhi

Sent from my iPhone



--
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: Hive configuration property

Posted by Abhishek <ab...@gmail.com>.
Thanks bejoy, I will try that.

Regards 
Abhi

Sent from my iPhone

On Sep 26, 2012, at 11:34 AM, Bejoy KS <be...@yahoo.com> wrote:

> Hi Abshiek
> 
> Based on my experience you can always provide the number of reduce tasks (mapred.reduce.tasks) based on the data volume your query handles. It can yield you better performance numbers. 
>  
> Regards,
> Bejoy KS
> 
> From: Abhishek <ab...@gmail.com>
> To: "user@hive.apache.org" <us...@hive.apache.org> 
> Cc: "user@hive.apache.org" <us...@hive.apache.org> 
> Sent: Wednesday, September 26, 2012 7:04 PM
> Subject: Re: Hive configuration property
> 
> Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.
> 
> Regards
> Abhi
> 
> 
> 
> Sent from my iPhone
> 
> On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bh...@gmail.com> wrote:
> 
>> 
>> I'm no expert in hive, but here are my 2 cents. 
>> 
>> By default hive schedules a reducer per every 1 GB of data ( change that value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is huge, there will be large number of reducers, which might be unnecessary.( Sometimes large number of reducers slows down the job because their number exceeds total task slots and they keep waiting for their turn. Not to forget, the initialization overheads for each task..jvm etc.).
>> 
>> Overall, I think there cannot be any optimum values for a cluster. It depends on the type of queries, size of your inputs, size of map outputs in the jobs (intermediate outputs ). So you can can check various values and see which one is the best. From my experience setting "hive.exec.reducers.max" to total number of reduce slots in your cluster gives you a decent performance since all the reducers are completed in a single wave. (This may or maynot work for you, worth giving a try).
>> 
>> 
>> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <ab...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I have doubt regarding below properties, is it a good practice to override below properties in hive.
>> 
>> If yes, what is the optimal values for the following properties?
>> 
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>   set mapred.reduce.tasks=<number>
>> 
>> Regards
>> Abhi
>> 
>> Sent from my iPhone
>> 
>> 
>> 
>> -- 
>> Regards,
>> Bharath .V
>> w:http://researchweb.iiit.ac.in/~bharath.v
> 
> 

Re: Hive configuration property

Posted by Bejoy KS <be...@yahoo.com>.
Hi Abshiek

Based on my experience you can always provide the number of reduce tasks (mapred.reduce.tasks) based on the data volume your query handles. It can yield you better performance numbers. 
 
Regards,
Bejoy KS


________________________________
 From: Abhishek <ab...@gmail.com>
To: "user@hive.apache.org" <us...@hive.apache.org> 
Cc: "user@hive.apache.org" <us...@hive.apache.org> 
Sent: Wednesday, September 26, 2012 7:04 PM
Subject: Re: Hive configuration property
 

Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.

Regards
Abhi



Sent from my iPhone

On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bh...@gmail.com> wrote:



>
>I'm no expert in hive, but here are my 2 cents. 
>
>
>By default hive schedules a reducer per every 1 GB of data ( change that value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is huge, there will be large number of reducers, which might be unnecessary.( Sometimes large number of reducers slows down the job because their number exceeds total task slots and they keep waiting for their turn. Not to forget, the initialization overheads for each task..jvm etc.).
>
>
>Overall, I think there cannot be any optimum values for a cluster. It depends on the type of queries, size of your inputs, size of map outputs in the jobs (intermediate outputs ). So you can can check various values and see which one is the best. From my experience setting "hive.exec.reducers.max" to total number of reduce slots in your cluster gives you a decent performance since all the reducers are completed in a single wave. (This may or maynot work for you, worth giving a try).
>
>
>
>
>On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <ab...@gmail.com> wrote:
>
>
>>
>>Hi all,
>>
>>
>>I have doubt regarding below properties, is it a good practice to override below properties in hive.
>>
>>
>>If yes, what is the optimal values for the following properties?
>>
>>  set hive.exec.reducers.bytes.per.reducer=<number>
>>In order to limit the maximum number of reducers:
>>  set hive.exec.reducers.max=<number>
>>In order to set a constant number of reducers:
>>  set mapred.reduce.tasks=<number>
>>
>>
>>Regards
>>Abhi
>>
>>Sent from my iPhone
>
>
>
>-- 
>Regards,
>Bharath .V
>w:http://researchweb.iiit.ac.in/~bharath.v
>

Re: Hive configuration property

Posted by Abhishek <ab...@gmail.com>.
Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property.

Regards
Abhi



Sent from my iPhone

On Sep 26, 2012, at 9:23 AM, bharath vissapragada <bh...@gmail.com> wrote:

> 
> I'm no expert in hive, but here are my 2 cents. 
> 
> By default hive schedules a reducer per every 1 GB of data ( change that value by modifying hive.exec.reducers.bytes.per.reducer ) . If your input data is huge, there will be large number of reducers, which might be unnecessary.( Sometimes large number of reducers slows down the job because their number exceeds total task slots and they keep waiting for their turn. Not to forget, the initialization overheads for each task..jvm etc.).
> 
> Overall, I think there cannot be any optimum values for a cluster. It depends on the type of queries, size of your inputs, size of map outputs in the jobs (intermediate outputs ). So you can can check various values and see which one is the best. From my experience setting "hive.exec.reducers.max" to total number of reduce slots in your cluster gives you a decent performance since all the reducers are completed in a single wave. (This may or maynot work for you, worth giving a try).
> 
> 
> On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <ab...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I have doubt regarding below properties, is it a good practice to override below properties in hive.
>> 
>> If yes, what is the optimal values for the following properties?
>> 
>>   set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>>   set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>>   set mapred.reduce.tasks=<number>
>> 
>> Regards
>> Abhi
>> 
>> Sent from my iPhone
> 
> 
> 
> -- 
> Regards,
> Bharath .V
> w:http://researchweb.iiit.ac.in/~bharath.v

Re: Hive configuration property

Posted by bharath vissapragada <bh...@gmail.com>.
I'm no expert in hive, but here are my 2 cents.

By default hive schedules a reducer per every 1 GB of data ( change that
value by modifying *hive.exec.reducers.bytes.per.reducer ) *. If your input
data is huge, there will be large number of reducers, which might be
unnecessary.( Sometimes large number of reducers slows down the job because
their number exceeds total task slots and they keep waiting for their turn.
Not to forget, the initialization overheads for each task..jvm etc.).

Overall, I think there cannot be any optimum values for a cluster. It
depends on the type of queries, size of your inputs, size of map outputs in
the jobs (intermediate outputs ). So you can can check various values and
see which one is the best. From my experience setting
"hive.exec.reducers.max" to total number of reduce slots in your cluster
gives you a decent performance since all the reducers are completed in a
single wave. (This may or maynot work for you, worth giving a try).


On Wed, Sep 26, 2012 at 5:58 PM, Abhishek <ab...@gmail.com> wrote:

> *
> *
> *Hi all,*
> *
> *
> *I have doubt regarding below properties, is it a good practice to
> override below properties in hive.*
> *
> *
> *If yes, what is the optimal values for the following properties?*
> *
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>*
> *
> *
> *Regards*
> *Abhi
> *
> Sent from my iPhone
>



-- 
Regards,
Bharath .V
w:http://researchweb.iiit.ac.in/~bharath.v