You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by rauljin <li...@sina.com> on 2013/04/17 06:53:37 UTC

How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin

Re: How to balance reduce job

Posted by shashwat shriparv <dw...@gmail.com>.

The number of reducer running depends on the data available.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Tue, May 7, 2013 at 8:43 PM, Tony Burton <TB...@sportingindex.com>wrote:

> ** **
>
> The typical Partitioner method for assigning reducer r from reducers R is*
> ***
>
> ** **
>
> r = hash(key) % count(R)****
>
> ** **
>
> However if you find your partitioner is assigning your data to too few or
> one reducers, I found that changing the count(R) to the next odd number or
> (even better) prime number above count(R) is a good rule of thumb to follow.
> ****
>
> ** **
>
> Tony****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
> *Sent:* 17 April 2013 07:19
> *To:* user@hadoop.apache.org
> *Cc:* Mohammad Tariq
>
> *Subject:* Re: How to balance reduce job****
>
> ** **
>
> Yes, That is a valid point.
>
> The partitioner might do non uniform distribution and reducers can be
> unevenly loaded.
>
> But this doesn't change the number of reducers and its distribution across
> nodes. The bottom issue as I understand is that his reduce tasks are
> scheduled on just a few nodes.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Ajay Srivastava <Aj...@guavus.com> ****
>
> *Date: *Wed, 17 Apr 2013 06:02:30 +0000****
>
> *To: *<us...@hadoop.apache.org>; <
> bejoy.hadoop@gmail.com><be...@gmail.com>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Cc: *Mohammad Tariq<do...@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Tariq probably meant distribution of keys from <key, value> pair emitted
> by mapper.****
>
> Partitioner distributes these pairs to different reducers based on key. If
> data is such that keys are skewed then most of the records may go to same
> reducer.****
>
> ** **
>
> ** **
>
> ** **
>
> Regards,****
>
> Ajay Srivastava****
>
> ** **
>
> ** **
>
> On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>****
>
>  <be...@gmail.com> wrote:****
>
>
>
> ****
>
>
> Uniform Data distribution across HDFS is one of the factor that ensures
> map tasks are uniformly distributed across nodes. But reduce tasks doesn't
> depend on data distribution it is purely based on slot availability.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Mohammad Tariq <do...@gmail.com> ****
>
> *Date: *Wed, 17 Apr 2013 10:46:27 +0530****
>
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>; Bejoy Ks<
> bejoy.hadoop@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Just to add to Bejoy's comments, it also depends on the data distribution.
> Is your data properly distributed across the HDFS?****
>
>
> ****
>
> Warm Regards, ****
>
> Tariq****
>
> https://mtariq.jux.com/****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:****
>
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *rauljin <li...@sina.com> ****
>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800****
>
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Subject: *How to balance reduce job****
>
> ** **
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .****
>
>  ****
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.****
>
>  ****
>
> Any ideas?****
>
>  ****
>
> Thanks.****
>
>  ****
> ------------------------------
>
> rauljin****
>
> ** **
>
> ** **
>
>
>
>
> *****************************************************************************
> P *Please consider the environment before printing this email or
> attachments*
>
>
> This email and any attachments are confidential, protected by copyright
> and may be legally privileged. If you are not the intended recipient, then
> the dissemination or copying of this email is prohibited. If you have
> received this in error, please notify the sender by replying by email and
> then delete the email completely from your system. Neither Sporting Index
> nor the sender accepts responsibility for any virus, or any other defect
> which might affect any computer or IT system into which the email is
> received and/or opened. It is the responsibility of the recipient to scan
> the email and no responsibility is accepted for any loss or damage arising
> in any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842, whose
> registered office is at Gateway House, Milverton Street, London, SE11 4AP.
> Sporting Index Ltd is authorised and regulated by the UK Financial Services
> Authority (reg. no. 150404) and Gambling Commission (reg. no.
> 000-027343-R-308898-001). Any financial promotion contained herein has been
> issued and approved by Sporting Index Ltd.
>
> Outbound email has been scanned for viruses and SPAM
>
>

Re: How to balance reduce job

Posted by shashwat shriparv <dw...@gmail.com>.

The number of reducer running depends on the data available.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Tue, May 7, 2013 at 8:43 PM, Tony Burton <TB...@sportingindex.com>wrote:

> ** **
>
> The typical Partitioner method for assigning reducer r from reducers R is*
> ***
>
> ** **
>
> r = hash(key) % count(R)****
>
> ** **
>
> However if you find your partitioner is assigning your data to too few or
> one reducers, I found that changing the count(R) to the next odd number or
> (even better) prime number above count(R) is a good rule of thumb to follow.
> ****
>
> ** **
>
> Tony****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
> *Sent:* 17 April 2013 07:19
> *To:* user@hadoop.apache.org
> *Cc:* Mohammad Tariq
>
> *Subject:* Re: How to balance reduce job****
>
> ** **
>
> Yes, That is a valid point.
>
> The partitioner might do non uniform distribution and reducers can be
> unevenly loaded.
>
> But this doesn't change the number of reducers and its distribution across
> nodes. The bottom issue as I understand is that his reduce tasks are
> scheduled on just a few nodes.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Ajay Srivastava <Aj...@guavus.com> ****
>
> *Date: *Wed, 17 Apr 2013 06:02:30 +0000****
>
> *To: *<us...@hadoop.apache.org>; <
> bejoy.hadoop@gmail.com><be...@gmail.com>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Cc: *Mohammad Tariq<do...@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Tariq probably meant distribution of keys from <key, value> pair emitted
> by mapper.****
>
> Partitioner distributes these pairs to different reducers based on key. If
> data is such that keys are skewed then most of the records may go to same
> reducer.****
>
> ** **
>
> ** **
>
> ** **
>
> Regards,****
>
> Ajay Srivastava****
>
> ** **
>
> ** **
>
> On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>****
>
>  <be...@gmail.com> wrote:****
>
>
>
> ****
>
>
> Uniform Data distribution across HDFS is one of the factor that ensures
> map tasks are uniformly distributed across nodes. But reduce tasks doesn't
> depend on data distribution it is purely based on slot availability.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Mohammad Tariq <do...@gmail.com> ****
>
> *Date: *Wed, 17 Apr 2013 10:46:27 +0530****
>
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>; Bejoy Ks<
> bejoy.hadoop@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Just to add to Bejoy's comments, it also depends on the data distribution.
> Is your data properly distributed across the HDFS?****
>
>
> ****
>
> Warm Regards, ****
>
> Tariq****
>
> https://mtariq.jux.com/****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:****
>
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *rauljin <li...@sina.com> ****
>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800****
>
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Subject: *How to balance reduce job****
>
> ** **
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .****
>
>  ****
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.****
>
>  ****
>
> Any ideas?****
>
>  ****
>
> Thanks.****
>
>  ****
> ------------------------------
>
> rauljin****
>
> ** **
>
> ** **
>
>
>
>
> *****************************************************************************
> P *Please consider the environment before printing this email or
> attachments*
>
>
> This email and any attachments are confidential, protected by copyright
> and may be legally privileged. If you are not the intended recipient, then
> the dissemination or copying of this email is prohibited. If you have
> received this in error, please notify the sender by replying by email and
> then delete the email completely from your system. Neither Sporting Index
> nor the sender accepts responsibility for any virus, or any other defect
> which might affect any computer or IT system into which the email is
> received and/or opened. It is the responsibility of the recipient to scan
> the email and no responsibility is accepted for any loss or damage arising
> in any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842, whose
> registered office is at Gateway House, Milverton Street, London, SE11 4AP.
> Sporting Index Ltd is authorised and regulated by the UK Financial Services
> Authority (reg. no. 150404) and Gambling Commission (reg. no.
> 000-027343-R-308898-001). Any financial promotion contained herein has been
> issued and approved by Sporting Index Ltd.
>
> Outbound email has been scanned for viruses and SPAM
>
>

Re: How to balance reduce job

Posted by shashwat shriparv <dw...@gmail.com>.

The number of reducer running depends on the data available.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Tue, May 7, 2013 at 8:43 PM, Tony Burton <TB...@sportingindex.com>wrote:

> ** **
>
> The typical Partitioner method for assigning reducer r from reducers R is*
> ***
>
> ** **
>
> r = hash(key) % count(R)****
>
> ** **
>
> However if you find your partitioner is assigning your data to too few or
> one reducers, I found that changing the count(R) to the next odd number or
> (even better) prime number above count(R) is a good rule of thumb to follow.
> ****
>
> ** **
>
> Tony****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
> *Sent:* 17 April 2013 07:19
> *To:* user@hadoop.apache.org
> *Cc:* Mohammad Tariq
>
> *Subject:* Re: How to balance reduce job****
>
> ** **
>
> Yes, That is a valid point.
>
> The partitioner might do non uniform distribution and reducers can be
> unevenly loaded.
>
> But this doesn't change the number of reducers and its distribution across
> nodes. The bottom issue as I understand is that his reduce tasks are
> scheduled on just a few nodes.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Ajay Srivastava <Aj...@guavus.com> ****
>
> *Date: *Wed, 17 Apr 2013 06:02:30 +0000****
>
> *To: *<us...@hadoop.apache.org>; <
> bejoy.hadoop@gmail.com><be...@gmail.com>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Cc: *Mohammad Tariq<do...@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Tariq probably meant distribution of keys from <key, value> pair emitted
> by mapper.****
>
> Partitioner distributes these pairs to different reducers based on key. If
> data is such that keys are skewed then most of the records may go to same
> reducer.****
>
> ** **
>
> ** **
>
> ** **
>
> Regards,****
>
> Ajay Srivastava****
>
> ** **
>
> ** **
>
> On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>****
>
>  <be...@gmail.com> wrote:****
>
>
>
> ****
>
>
> Uniform Data distribution across HDFS is one of the factor that ensures
> map tasks are uniformly distributed across nodes. But reduce tasks doesn't
> depend on data distribution it is purely based on slot availability.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Mohammad Tariq <do...@gmail.com> ****
>
> *Date: *Wed, 17 Apr 2013 10:46:27 +0530****
>
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>; Bejoy Ks<
> bejoy.hadoop@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Just to add to Bejoy's comments, it also depends on the data distribution.
> Is your data properly distributed across the HDFS?****
>
>
> ****
>
> Warm Regards, ****
>
> Tariq****
>
> https://mtariq.jux.com/****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:****
>
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *rauljin <li...@sina.com> ****
>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800****
>
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Subject: *How to balance reduce job****
>
> ** **
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .****
>
>  ****
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.****
>
>  ****
>
> Any ideas?****
>
>  ****
>
> Thanks.****
>
>  ****
> ------------------------------
>
> rauljin****
>
> ** **
>
> ** **
>
>
>
>
> *****************************************************************************
> P *Please consider the environment before printing this email or
> attachments*
>
>
> This email and any attachments are confidential, protected by copyright
> and may be legally privileged. If you are not the intended recipient, then
> the dissemination or copying of this email is prohibited. If you have
> received this in error, please notify the sender by replying by email and
> then delete the email completely from your system. Neither Sporting Index
> nor the sender accepts responsibility for any virus, or any other defect
> which might affect any computer or IT system into which the email is
> received and/or opened. It is the responsibility of the recipient to scan
> the email and no responsibility is accepted for any loss or damage arising
> in any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842, whose
> registered office is at Gateway House, Milverton Street, London, SE11 4AP.
> Sporting Index Ltd is authorised and regulated by the UK Financial Services
> Authority (reg. no. 150404) and Gambling Commission (reg. no.
> 000-027343-R-308898-001). Any financial promotion contained herein has been
> issued and approved by Sporting Index Ltd.
>
> Outbound email has been scanned for viruses and SPAM
>
>

Re: How to balance reduce job

Posted by shashwat shriparv <dw...@gmail.com>.

The number of reducer running depends on the data available.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Tue, May 7, 2013 at 8:43 PM, Tony Burton <TB...@sportingindex.com>wrote:

> ** **
>
> The typical Partitioner method for assigning reducer r from reducers R is*
> ***
>
> ** **
>
> r = hash(key) % count(R)****
>
> ** **
>
> However if you find your partitioner is assigning your data to too few or
> one reducers, I found that changing the count(R) to the next odd number or
> (even better) prime number above count(R) is a good rule of thumb to follow.
> ****
>
> ** **
>
> Tony****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
> *Sent:* 17 April 2013 07:19
> *To:* user@hadoop.apache.org
> *Cc:* Mohammad Tariq
>
> *Subject:* Re: How to balance reduce job****
>
> ** **
>
> Yes, That is a valid point.
>
> The partitioner might do non uniform distribution and reducers can be
> unevenly loaded.
>
> But this doesn't change the number of reducers and its distribution across
> nodes. The bottom issue as I understand is that his reduce tasks are
> scheduled on just a few nodes.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Ajay Srivastava <Aj...@guavus.com> ****
>
> *Date: *Wed, 17 Apr 2013 06:02:30 +0000****
>
> *To: *<us...@hadoop.apache.org>; <
> bejoy.hadoop@gmail.com><be...@gmail.com>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Cc: *Mohammad Tariq<do...@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Tariq probably meant distribution of keys from <key, value> pair emitted
> by mapper.****
>
> Partitioner distributes these pairs to different reducers based on key. If
> data is such that keys are skewed then most of the records may go to same
> reducer.****
>
> ** **
>
> ** **
>
> ** **
>
> Regards,****
>
> Ajay Srivastava****
>
> ** **
>
> ** **
>
> On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>****
>
>  <be...@gmail.com> wrote:****
>
>
>
> ****
>
>
> Uniform Data distribution across HDFS is one of the factor that ensures
> map tasks are uniformly distributed across nodes. But reduce tasks doesn't
> depend on data distribution it is purely based on slot availability.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *Mohammad Tariq <do...@gmail.com> ****
>
> *Date: *Wed, 17 Apr 2013 10:46:27 +0530****
>
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>; Bejoy Ks<
> bejoy.hadoop@gmail.com>****
>
> *Subject: *Re: How to balance reduce job****
>
> ** **
>
> Just to add to Bejoy's comments, it also depends on the data distribution.
> Is your data properly distributed across the HDFS?****
>
>
> ****
>
> Warm Regards, ****
>
> Tariq****
>
> https://mtariq.jux.com/****
>
> cloudfront.blogspot.com****
>
> ** **
>
> On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:****
>
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.****
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos****
> ------------------------------
>
> *From: *rauljin <li...@sina.com> ****
>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800****
>
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>****
>
> *ReplyTo: *user@hadoop.apache.org ****
>
> *Subject: *How to balance reduce job****
>
> ** **
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .****
>
>  ****
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.****
>
>  ****
>
> Any ideas?****
>
>  ****
>
> Thanks.****
>
>  ****
> ------------------------------
>
> rauljin****
>
> ** **
>
> ** **
>
>
>
>
> *****************************************************************************
> P *Please consider the environment before printing this email or
> attachments*
>
>
> This email and any attachments are confidential, protected by copyright
> and may be legally privileged. If you are not the intended recipient, then
> the dissemination or copying of this email is prohibited. If you have
> received this in error, please notify the sender by replying by email and
> then delete the email completely from your system. Neither Sporting Index
> nor the sender accepts responsibility for any virus, or any other defect
> which might affect any computer or IT system into which the email is
> received and/or opened. It is the responsibility of the recipient to scan
> the email and no responsibility is accepted for any loss or damage arising
> in any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842, whose
> registered office is at Gateway House, Milverton Street, London, SE11 4AP.
> Sporting Index Ltd is authorised and regulated by the UK Financial Services
> Authority (reg. no. 150404) and Gambling Commission (reg. no.
> 000-027343-R-308898-001). Any financial promotion contained herein has been
> issued and approved by Sporting Index Ltd.
>
> Outbound email has been scanned for viruses and SPAM
>
>

RE: How to balance reduce job

Posted by Tony Burton <TB...@SportingIndex.com>.

The typical Partitioner method for assigning reducer r from reducers R is

r = hash(key) % count(R)

However if you find your partitioner is assigning your data to too few or one reducers, I found that changing the count(R) to the next odd number or (even better) prime number above count(R) is a good rule of thumb to follow.

Tony




From: bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
Sent: 17 April 2013 07:19
To: user@hadoop.apache.org
Cc: Mohammad Tariq
Subject: Re: How to balance reduce job

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Ajay Srivastava <Aj...@guavus.com>>
Date: Wed, 17 Apr 2013 06:02:30 +0000
To: <us...@hadoop.apache.org>>; <be...@gmail.com>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Cc: Mohammad Tariq<do...@gmail.com>>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:



Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>

On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

RE: How to balance reduce job

Posted by Tony Burton <TB...@SportingIndex.com>.

The typical Partitioner method for assigning reducer r from reducers R is

r = hash(key) % count(R)

However if you find your partitioner is assigning your data to too few or one reducers, I found that changing the count(R) to the next odd number or (even better) prime number above count(R) is a good rule of thumb to follow.

Tony




From: bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
Sent: 17 April 2013 07:19
To: user@hadoop.apache.org
Cc: Mohammad Tariq
Subject: Re: How to balance reduce job

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Ajay Srivastava <Aj...@guavus.com>>
Date: Wed, 17 Apr 2013 06:02:30 +0000
To: <us...@hadoop.apache.org>>; <be...@gmail.com>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Cc: Mohammad Tariq<do...@gmail.com>>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:



Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>

On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

RE: How to balance reduce job

Posted by Tony Burton <TB...@SportingIndex.com>.

The typical Partitioner method for assigning reducer r from reducers R is

r = hash(key) % count(R)

However if you find your partitioner is assigning your data to too few or one reducers, I found that changing the count(R) to the next odd number or (even better) prime number above count(R) is a good rule of thumb to follow.

Tony




From: bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
Sent: 17 April 2013 07:19
To: user@hadoop.apache.org
Cc: Mohammad Tariq
Subject: Re: How to balance reduce job

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Ajay Srivastava <Aj...@guavus.com>>
Date: Wed, 17 Apr 2013 06:02:30 +0000
To: <us...@hadoop.apache.org>>; <be...@gmail.com>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Cc: Mohammad Tariq<do...@gmail.com>>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:



Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>

On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

RE: How to balance reduce job

Posted by Tony Burton <TB...@SportingIndex.com>.

The typical Partitioner method for assigning reducer r from reducers R is

r = hash(key) % count(R)

However if you find your partitioner is assigning your data to too few or one reducers, I found that changing the count(R) to the next odd number or (even better) prime number above count(R) is a good rule of thumb to follow.

Tony




From: bejoy.hadoop@gmail.com [mailto:bejoy.hadoop@gmail.com]
Sent: 17 April 2013 07:19
To: user@hadoop.apache.org
Cc: Mohammad Tariq
Subject: Re: How to balance reduce job

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Ajay Srivastava <Aj...@guavus.com>>
Date: Wed, 17 Apr 2013 06:02:30 +0000
To: <us...@hadoop.apache.org>>; <be...@gmail.com>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Cc: Mohammad Tariq<do...@gmail.com>>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:



Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>

On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

Re: How to balance reduce job

Posted by be...@gmail.com.

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Ajay Srivastava <Aj...@guavus.com>
Date: Wed, 17 Apr 2013 06:02:30 
To: <us...@hadoop.apache.org>; <be...@gmail.com>
Reply-To: user@hadoop.apache.org
Cc: Mohammad Tariq<do...@gmail.com>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

Re: How to balance reduce job

Posted by be...@gmail.com.

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Ajay Srivastava <Aj...@guavus.com>
Date: Wed, 17 Apr 2013 06:02:30 
To: <us...@hadoop.apache.org>; <be...@gmail.com>
Reply-To: user@hadoop.apache.org
Cc: Mohammad Tariq<do...@gmail.com>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

Re: How to balance reduce job

Posted by be...@gmail.com.

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Ajay Srivastava <Aj...@guavus.com>
Date: Wed, 17 Apr 2013 06:02:30 
To: <us...@hadoop.apache.org>; <be...@gmail.com>
Reply-To: user@hadoop.apache.org
Cc: Mohammad Tariq<do...@gmail.com>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

Re: How to balance reduce job

Posted by be...@gmail.com.

Yes, That is a valid point.

The partitioner might do non uniform distribution and reducers can be unevenly loaded.

But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Ajay Srivastava <Aj...@guavus.com>
Date: Wed, 17 Apr 2013 06:02:30 
To: <us...@hadoop.apache.org>; <be...@gmail.com>
Reply-To: user@hadoop.apache.org
Cc: Mohammad Tariq<do...@gmail.com>
Subject: Re: How to balance reduce job

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

Re: How to balance reduce job

Posted by Ajay Srivastava <Aj...@guavus.com>.

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

Re: How to balance reduce job

Posted by Ajay Srivastava <Aj...@guavus.com>.

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

Re: How to balance reduce job

Posted by Ajay Srivastava <Aj...@guavus.com>.

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

Re: How to balance reduce job

Posted by Ajay Srivastava <Aj...@guavus.com>.

Tariq probably meant distribution of keys from <key, value> pair emitted by mapper.
Partitioner distributes these pairs to different reducers based on key. If data is such that keys are skewed then most of the records may go to same reducer.



Regards,
Ajay Srivastava


On 17-Apr-2013, at 11:08 AM, <be...@gmail.com>>
 <be...@gmail.com>> wrote:


Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Mohammad Tariq <do...@gmail.com>>
Date: Wed, 17 Apr 2013 10:46:27 +0530
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>; Bejoy Ks<be...@gmail.com>>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution. Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com<http://cloudfront.blogspot.com/>


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com>> wrote:
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more.
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: rauljin <li...@sina.com>>
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<ma...@hadoop.apache.org>>
ReplyTo: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: How to balance reduce job

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.

________________________________
rauljin

Re: How to balance reduce job

Posted by be...@gmail.com.

Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Mohammad Tariq <do...@gmail.com>
Date: Wed, 17 Apr 2013 10:46:27 
To: user@hadoop.apache.org<us...@hadoop.apache.org>; Bejoy Ks<be...@gmail.com>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:

> **
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: How to balance reduce job

Posted by be...@gmail.com.

Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Mohammad Tariq <do...@gmail.com>
Date: Wed, 17 Apr 2013 10:46:27 
To: user@hadoop.apache.org<us...@hadoop.apache.org>; Bejoy Ks<be...@gmail.com>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:

> **
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: How to balance reduce job

Posted by be...@gmail.com.

Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Mohammad Tariq <do...@gmail.com>
Date: Wed, 17 Apr 2013 10:46:27 
To: user@hadoop.apache.org<us...@hadoop.apache.org>; Bejoy Ks<be...@gmail.com>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:

> **
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: How to balance reduce job

Posted by be...@gmail.com.

Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Mohammad Tariq <do...@gmail.com>
Date: Wed, 17 Apr 2013 10:46:27 
To: user@hadoop.apache.org<us...@hadoop.apache.org>; Bejoy Ks<be...@gmail.com>
Subject: Re: How to balance reduce job

Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:

> **
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: How to balance reduce job

Posted by Mohammad Tariq <do...@gmail.com>.

Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:

> **
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: Re: How to balance reduce job

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

You can use input sampler , and you have to plug a custom partitioner which
would ensure that all reducers have near-equal pairs to process. The input
sampler goes over the sampled data before the execution of the job starts.
I also had some doubt , but got no response.

Thanks,
Rahul


On Wed, Apr 17, 2013 at 12:17 PM, rauljin <li...@sina.com> wrote:

> **
>      <property>
>         <name>mapred.tasktracker.map.tasks.maximum</name>
>         <value>4</value>
>     </property>
>
>     <property>
>         <name>mapred.tasktracker.reduce.tasks.maximum</name>
>         <value>4</value>
>     </property>
>
>    I am not clear the number  of reuce slots in each Task tracker.Is it
> define in the configuration?
>
>
>
>
>
> ------------------------------
> rauljin
>
>  *From:* bejoy.hadoop <be...@gmail.com>
> *Date:* 2013-04-17 13:09
> *To:* user <us...@hadoop.apache.org>; liujin666jin <li...@sina.com>
> *Subject:* Re: How to balance reduce job
>  Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: *rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: *user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: Re: How to balance reduce job

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

You can use input sampler , and you have to plug a custom partitioner which
would ensure that all reducers have near-equal pairs to process. The input
sampler goes over the sampled data before the execution of the job starts.
I also had some doubt , but got no response.

Thanks,
Rahul


On Wed, Apr 17, 2013 at 12:17 PM, rauljin <li...@sina.com> wrote:

> **
>      <property>
>         <name>mapred.tasktracker.map.tasks.maximum</name>
>         <value>4</value>
>     </property>
>
>     <property>
>         <name>mapred.tasktracker.reduce.tasks.maximum</name>
>         <value>4</value>
>     </property>
>
>    I am not clear the number  of reuce slots in each Task tracker.Is it
> define in the configuration?
>
>
>
>
>
> ------------------------------
> rauljin
>
>  *From:* bejoy.hadoop <be...@gmail.com>
> *Date:* 2013-04-17 13:09
> *To:* user <us...@hadoop.apache.org>; liujin666jin <li...@sina.com>
> *Subject:* Re: How to balance reduce job
>  Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: *rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: *user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: Re: How to balance reduce job

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

You can use input sampler , and you have to plug a custom partitioner which
would ensure that all reducers have near-equal pairs to process. The input
sampler goes over the sampled data before the execution of the job starts.
I also had some doubt , but got no response.

Thanks,
Rahul


On Wed, Apr 17, 2013 at 12:17 PM, rauljin <li...@sina.com> wrote:

> **
>      <property>
>         <name>mapred.tasktracker.map.tasks.maximum</name>
>         <value>4</value>
>     </property>
>
>     <property>
>         <name>mapred.tasktracker.reduce.tasks.maximum</name>
>         <value>4</value>
>     </property>
>
>    I am not clear the number  of reuce slots in each Task tracker.Is it
> define in the configuration?
>
>
>
>
>
> ------------------------------
> rauljin
>
>  *From:* bejoy.hadoop <be...@gmail.com>
> *Date:* 2013-04-17 13:09
> *To:* user <us...@hadoop.apache.org>; liujin666jin <li...@sina.com>
> *Subject:* Re: How to balance reduce job
>  Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: *rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: *user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: Re: How to balance reduce job

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

You can use input sampler , and you have to plug a custom partitioner which
would ensure that all reducers have near-equal pairs to process. The input
sampler goes over the sampled data before the execution of the job starts.
I also had some doubt , but got no response.

Thanks,
Rahul


On Wed, Apr 17, 2013 at 12:17 PM, rauljin <li...@sina.com> wrote:

> **
>      <property>
>         <name>mapred.tasktracker.map.tasks.maximum</name>
>         <value>4</value>
>     </property>
>
>     <property>
>         <name>mapred.tasktracker.reduce.tasks.maximum</name>
>         <value>4</value>
>     </property>
>
>    I am not clear the number  of reuce slots in each Task tracker.Is it
> define in the configuration?
>
>
>
>
>
> ------------------------------
> rauljin
>
>  *From:* bejoy.hadoop <be...@gmail.com>
> *Date:* 2013-04-17 13:09
> *To:* user <us...@hadoop.apache.org>; liujin666jin <li...@sina.com>
> *Subject:* Re: How to balance reduce job
>  Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: *rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: *user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: Re: How to balance reduce job

Posted by rauljin <li...@sina.com>.

    <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>4</value>
    </property>

    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>4</value>
    </property>

   I am not clear the number  of reuce slots in each Task tracker.Is it define in the configuration?

 






rauljin

From: bejoy.hadoop
Date: 2013-04-17 13:09
To: user; liujin666jin
Subject: Re: How to balance reduce job
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more. 
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos



From: rauljin <li...@sina.com> 
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo: user@hadoop.apache.org 
Subject: How to balance reduce job


8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin

Re: How to balance reduce job

Posted by Mohammad Tariq <do...@gmail.com>.

Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:

> **
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: How to balance reduce job

Posted by Mohammad Tariq <do...@gmail.com>.

Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:

> **
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: Re: How to balance reduce job

Posted by rauljin <li...@sina.com>.

    <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>4</value>
    </property>

    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>4</value>
    </property>

   I am not clear the number  of reuce slots in each Task tracker.Is it define in the configuration?

 






rauljin

From: bejoy.hadoop
Date: 2013-04-17 13:09
To: user; liujin666jin
Subject: Re: How to balance reduce job
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more. 
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos



From: rauljin <li...@sina.com> 
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo: user@hadoop.apache.org 
Subject: How to balance reduce job


8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin

Re: Re: How to balance reduce job

Posted by rauljin <li...@sina.com>.

    <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>4</value>
    </property>

    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>4</value>
    </property>

   I am not clear the number  of reuce slots in each Task tracker.Is it define in the configuration?

 






rauljin

From: bejoy.hadoop
Date: 2013-04-17 13:09
To: user; liujin666jin
Subject: Re: How to balance reduce job
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more. 
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos



From: rauljin <li...@sina.com> 
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo: user@hadoop.apache.org 
Subject: How to balance reduce job


8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin

Re: Re: How to balance reduce job

Posted by rauljin <li...@sina.com>.

    <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>4</value>
    </property>

    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>4</value>
    </property>

   I am not clear the number  of reuce slots in each Task tracker.Is it define in the configuration?

 






rauljin

From: bejoy.hadoop
Date: 2013-04-17 13:09
To: user; liujin666jin
Subject: Re: How to balance reduce job
Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more. 
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos



From: rauljin <li...@sina.com> 
Date: Wed, 17 Apr 2013 12:53:37 +0800
To: user@hadoop.apache.org<us...@hadoop.apache.org>
ReplyTo: user@hadoop.apache.org 
Subject: How to balance reduce job


8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin

Re: How to balance reduce job

Posted by Mohammad Tariq <do...@gmail.com>.

Just to add to Bejoy's comments, it also depends on the data distribution.
Is your data properly distributed across the HDFS?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 17, 2013 at 10:39 AM, <be...@gmail.com> wrote:

> **
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * rauljin <li...@sina.com>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *How to balance reduce job
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
> Any ideas?
>
> Thanks.
>
> ------------------------------
> rauljin
>

Re: How to balance reduce job

Posted by be...@gmail.com.

Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
 Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more. 
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: rauljin <li...@sina.com>
Date: Wed, 17 Apr 2013 12:53:37 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: How to balance reduce job 

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin

Re: How to balance reduce job

Posted by be...@gmail.com.

Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
 Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more. 
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: rauljin <li...@sina.com>
Date: Wed, 17 Apr 2013 12:53:37 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: How to balance reduce job 

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin

Re: How to balance reduce job

Posted by be...@gmail.com.

Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
 Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more. 
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: rauljin <li...@sina.com>
Date: Wed, 17 Apr 2013 12:53:37 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: How to balance reduce job 

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin

Re: How to balance reduce job

Posted by be...@gmail.com.

Hi Rauljin

Few things to check here.
What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job?
 Based on the availability of slots the reduce tasks are scheduled on TTs.

You can do the following
Set the number of reduce tasks to 8 or more. 
Play with the number of slots (not very advisable for tweaking this on a job level )

The reducers are scheduled purely based on the slot availability so it won't be that easy to ensure that all TT are evenly loaded with same number of reducers.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: rauljin <li...@sina.com>
Date: Wed, 17 Apr 2013 12:53:37 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: How to balance reduce job 

8 datanode in my hadoop cluseter ,when running reduce job,there is only 2 datanode running the job .

I want to use the 8 datanode to run the reduce job,so I can balance the I/O press.

Any ideas?

Thanks.




rauljin