You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Shahab Yunus <sh...@gmail.com> on 2014/08/14 14:23:16 UTC

Relationship between number of reducers and number of regions in the table

I couldn't decide that whether it is an HBase question or Hadoop/Yarn.

In the utility class for MR jobs integerated with HBase,
*org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil, *

in the method:


*public static void initTableReducerJob(String table,*
*    Class<? extends TableReducer> reducer, Job job,*
*    Class partitioner, String quorumAddress, String serverClass,*
*    String serverImpl, boolean addDependencyJars) throws IOException;*

Im the above method the following check is added, while setting the number
of reducers that:


*...*
*int regions = outputTable.getRegionsInfo().size();*
*...*
*if (job.getNumReduceTasks() > regions) {*
*        job.setNumReduceTasks(outputTable.getRegionsInfo().size());*
*      }*
*...      *

What is the reason for doing this? And what are the negative effects we
don't follow this? I can think one that, in case of more than one reducer
writing/reading a same region can cause hot-spotting and performance
issues. Are there any other reasons to add this check as well?

Thanks a lot.

Regards,
Shahab