You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Nichole Treadway <kn...@gmail.com> on 2011/03/10 18:56:36 UTC

Region server getting hit with all requests during bulk import

Hi all,

I have a small Hadoop and HBase cluster with 4 nodes all acting as datanodes
and regionservers, with replication set to 3. I am bulk loading data in
HBase using the importtsv program, writing heavily to one table that
initially had no data in it and only 1 region. I'll call this TableA.

In HBase, I already had a table (tableB) with about 400 regions. These
regions were evenly distributed across the four nodes I have.

Here is the behavior I am observing with my bulk import of data: Initially,
one regionserver was assigned regions for TabelA, so it got all the initial
requests. When the number of regions became unbalanced across all four
nodes, regions for tableB (my old table) are reassigned to the other
regionservers, rather than any regions from my newer table (tableA). This
means that my one node continues to be hit with all requests, which is
slowing down my import.

How does HBase decide which regions to reassign when balancing, or is it
relatively arbitrary? Is there anything I can do at this point to force
regions of my TableA to be assigned to other region servers?

Re: Region server getting hit with all requests during bulk import

Posted by Ted Yu <yu...@gmail.com>.
On a related note:
I have tested my patch in our dev cluster for HBASE-3373
It works nicely.

On Thu, Mar 10, 2011 at 10:09 AM, Stack <st...@duboce.net> wrote:

> Nichole:
>
> There is a bug in our load balancer where what we thought random is
> not.  It'll be fixed in 0.90.2.  See
> https://issues.apache.org/jira/browse/HBASE-3586.
>
> Meantime, you'll need to do some manual region moving.  See 'move' in the
> shell.
>
> hbase> help 'move'
>
> Script it if you want to move a bunch of regions.
>
> Yours,
> St.Ack
>
>
> On Thu, Mar 10, 2011 at 9:56 AM, Nichole Treadway <kn...@gmail.com>
> wrote:
> > Hi all,
> >
> > I have a small Hadoop and HBase cluster with 4 nodes all acting as
> datanodes
> > and regionservers, with replication set to 3. I am bulk loading data in
> > HBase using the importtsv program, writing heavily to one table that
> > initially had no data in it and only 1 region. I'll call this TableA.
> >
> > In HBase, I already had a table (tableB) with about 400 regions. These
> > regions were evenly distributed across the four nodes I have.
> >
> > Here is the behavior I am observing with my bulk import of data:
> Initially,
> > one regionserver was assigned regions for TabelA, so it got all the
> initial
> > requests. When the number of regions became unbalanced across all four
> > nodes, regions for tableB (my old table) are reassigned to the other
> > regionservers, rather than any regions from my newer table (tableA). This
> > means that my one node continues to be hit with all requests, which is
> > slowing down my import.
> >
> > How does HBase decide which regions to reassign when balancing, or is it
> > relatively arbitrary? Is there anything I can do at this point to force
> > regions of my TableA to be assigned to other region servers?
> >
>

Re: Region server getting hit with all requests during bulk import

Posted by Stack <st...@duboce.net>.
Nichole:

There is a bug in our load balancer where what we thought random is
not.  It'll be fixed in 0.90.2.  See
https://issues.apache.org/jira/browse/HBASE-3586.

Meantime, you'll need to do some manual region moving.  See 'move' in the shell.

hbase> help 'move'

Script it if you want to move a bunch of regions.

Yours,
St.Ack


On Thu, Mar 10, 2011 at 9:56 AM, Nichole Treadway <kn...@gmail.com> wrote:
> Hi all,
>
> I have a small Hadoop and HBase cluster with 4 nodes all acting as datanodes
> and regionservers, with replication set to 3. I am bulk loading data in
> HBase using the importtsv program, writing heavily to one table that
> initially had no data in it and only 1 region. I'll call this TableA.
>
> In HBase, I already had a table (tableB) with about 400 regions. These
> regions were evenly distributed across the four nodes I have.
>
> Here is the behavior I am observing with my bulk import of data: Initially,
> one regionserver was assigned regions for TabelA, so it got all the initial
> requests. When the number of regions became unbalanced across all four
> nodes, regions for tableB (my old table) are reassigned to the other
> regionservers, rather than any regions from my newer table (tableA). This
> means that my one node continues to be hit with all requests, which is
> slowing down my import.
>
> How does HBase decide which regions to reassign when balancing, or is it
> relatively arbitrary? Is there anything I can do at this point to force
> regions of my TableA to be assigned to other region servers?
>