You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by sriram <rs...@gmail.com> on 2011/09/12 14:18:30 UTC

Reducer

Hi,
            I set setNumReduceTasks value as 10 for the job in the code but
always my reduce has only 1 task and it takes hell of time to process the
reducer phase.Also reducer strucks at 33% and after a long time it shows
progress after 60%.

What is the problem.Am i missing something?

Re: Reducer

Posted by sriram <rs...@gmail.com>.

 
> - Could you detail your job out? Does it use TableMapReduceUtil? What
> does it do and what all of HBase API does it call after you set your
> number of reduce tasks into the job configuration? (Note: If this is a
> general MapReduce query, you should send it to
> mapreduce-user@... instead)
 
Yes i am using TableMapReduceUtil.

> - You, or your submission code may be calling
> TableMapReduceUtil.limitNumReduceTasks(…) or
> TableMapReduceUtil.setNumReduceTasks(…) both of which reset the no. of
> reducers based on the number of output table regions at max. In this
> case, its better to see if you have only a single large region in your
> table, and get to fixing/splitting that (as it wouldn't parallelize
> otherwise).

TableMapReduceUtil.initTableReducerJob("tablename",IntSumReducer.class,job);
TableMapReduceUtil.limitNumReduceTasks("tablename",job); 

I am using only one table 
 
> - I believe a single reducer applied over a large data would cause a
> lot of time being spent in sorting, which is possibly why you're
> noticing the delay from 33-66 progress %.

yes i figured out that.Is there a way to increase the reducers....???

Re: Reducer

Posted by Harsh J <ha...@cloudera.com>.

Hey Sriram,

On Mon, Sep 12, 2011 at 5:48 PM, sriram <rs...@gmail.com> wrote:
>
> Hi,
>            I set setNumReduceTasks value as 10 for the job in the code but
> always my reduce has only 1 task and it takes hell of time to process the
> reducer phase.Also reducer strucks at 33% and after a long time it shows
> progress after 60%.
>
> What is the problem.Am i missing something?

- Could you detail your job out? Does it use TableMapReduceUtil? What
does it do and what all of HBase API does it call after you set your
number of reduce tasks into the job configuration? (Note: If this is a
general MapReduce query, you should send it to
mapreduce-user@hadoop.apache.org instead)

- If you setNumReduceTasks(…) and see no change, it probably means
some part of your code or used library is overriding it back to 1 -
possibly for some reason or due to a bug. Its hard to tell what would
reset it unless one can take a look at the code.

- You, or your submission code may be calling
TableMapReduceUtil.limitNumReduceTasks(…) or
TableMapReduceUtil.setNumReduceTasks(…) both of which reset the no. of
reducers based on the number of output table regions at max. In this
case, its better to see if you have only a single large region in your
table, and get to fixing/splitting that (as it wouldn't parallelize
otherwise).

- I believe a single reducer applied over a large data would cause a
lot of time being spent in sorting, which is possibly why you're
noticing the delay from 33-66 progress %.

-- 
Harsh J