You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Dhaval Makawana <dh...@gmail.com> on 2011/08/28 11:05:56 UTC

Number of map jobs per region

Hi,

We have 31 regions for a table in our HBase system and hence while scanning
the table via TableMapper, it creates 31 maps. Following is the line from
documentation where I got the reason for the same.

"Reading from HBase, the TableInputFormat asks HBase for the list of regions
and makes a map-per-region or mapred.map.tasks maps, whichever is smaller "
(
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html
)

Each region file size is almost 7 GB(lzo compressed  data) and map jobs are
taking huge time to processed the data. Is there any way to increase
parallelism(allocate more maps per region)?

Regards,
Dhaval

RE: Number of map jobs per region

Posted by "Ma, Ming" <mi...@ebay.com>.
Dhaval,

You might find https://issues.apache.org/jira/browse/HBASE-4063 useful when it is ready. Of course, you can always use your own customized version of TableInputFormat. https://issues.apache.org/jira/browse/HBASE-4039 allows you to provide your own TableInputFormat to TableMapReduceUtil.

Ming

-----Original Message-----
From: Dhaval Makawana [mailto:dhaval.makawana@gmail.com] 
Sent: Sunday, August 28, 2011 2:06 AM
To: user@hbase.apache.org
Subject: Number of map jobs per region

Hi,

We have 31 regions for a table in our HBase system and hence while scanning
the table via TableMapper, it creates 31 maps. Following is the line from
documentation where I got the reason for the same.

"Reading from HBase, the TableInputFormat asks HBase for the list of regions
and makes a map-per-region or mapred.map.tasks maps, whichever is smaller "
(
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html
)

Each region file size is almost 7 GB(lzo compressed  data) and map jobs are
taking huge time to processed the data. Is there any way to increase
parallelism(allocate more maps per region)?

Regards,
Dhaval