You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Shahab Yunus <sh...@gmail.com> on 2014/08/14 14:23:16 UTC
Relationship between number of reducers and number of regions in the table
I couldn't decide that whether it is an HBase question or Hadoop/Yarn.
In the utility class for MR jobs integerated with HBase,
*org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil, *
in the method:
*public static void initTableReducerJob(String table,*
* Class<? extends TableReducer> reducer, Job job,*
* Class partitioner, String quorumAddress, String serverClass,*
* String serverImpl, boolean addDependencyJars) throws IOException;*
Im the above method the following check is added, while setting the number
of reducers that:
*...*
*int regions = outputTable.getRegionsInfo().size();*
*...*
*if (job.getNumReduceTasks() > regions) {*
* job.setNumReduceTasks(outputTable.getRegionsInfo().size());*
* }*
*... *
What is the reason for doing this? And what are the negative effects we
don't follow this? I can think one that, in case of more than one reducer
writing/reading a same region can cause hot-spotting and performance
issues. Are there any other reasons to add this check as well?
Thanks a lot.
Regards,
Shahab