You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by abhay ratnaparkhi <ab...@gmail.com> on 2011/08/08 13:14:00 UTC

BulkLoading MR tasks

Hello,

I am trying to load data in HBase table using Map Reduce task.
I have input from one HBase table which has some id's stored (around
100000).

My task reads data from that table and invokes one API giving that ID as
input and fetching some documents and meta information (through API).
I am using "IdentityTableReducer" class and number of reducer tasks has been
set to 10.

When I run a program I can only see one map task running. Do I need to
configure number of map tasks also while running job?
My input table has only one region. Does having multiple regions for input
table increase number of map tasks.
I think framework is intelligent enough to generate Map tasks based on data
locality. Will region split help in this case?

Thank You!
Abhay

Re: BulkLoading MR tasks

Posted by Stack <st...@duboce.net>.
On Mon, Aug 8, 2011 at 4:14 AM, abhay ratnaparkhi
<ab...@gmail.com> wrote:
> I am trying to load data in HBase table using Map Reduce task.
> I have input from one HBase table which has some id's stored (around
> 100000).
>

The default hbase splitter creates a split per region.  If you source
table has one region only in it, then there will be one map task only.
 You need to do a custom splitter if you need more for the same
single-region input table.

> Will region split help in this case?
>

Or you could yes, split the source table into more than one region.

St.Ack