You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Weiwei Xiong <xi...@gmail.com> on 2011/03/14 18:50:08 UTC

Data is always written to one node

Hi,

I recently set up a 2-node Hadoop and HBase cluster and am trying to load
data into my HBase table using HBase client.

The issue bothers me is that the data are always written into one node of
the cluster, i.e., all the regions of the hbase table are on one node.

Is there any configuration I need to change for make the load balanced?

Thanks,
-- w

Re: Data is always written to one node

Posted by Weiwei Xiong <xi...@gmail.com>.
On Mon, Mar 14, 2011 at 8:50 PM, Stack <st...@duboce.net> wrote:

> Data balancing on hdfs is different to region balancing across your
> nodes.  Maybe there is a bug in our balancer if there are only two
> nodes involved?
>
> If there is nothing to balance, because its' already balanced, it'll
> output this:
>
> 2011-03-09 00:40:35,537 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
> servers=5 regions=1007 average=201.4 mostloaded=202 leastloaded=202
>
> ....else you will see:
>
>
> 2011-03-09 00:45:35,538 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance
> in 1ms. Moving 1 regions off of 1 overloaded servers onto 0 less
> loaded servers
> 2011-03-09 00:45:35,538 INFO org.apache.hadoop.hbase.master.HMaster:
> balance
> hri=usertable,user362822713,1299624789204.1720a98e1a0709e9a401a8eb9d8436bc.,
> src=sv4borg230,61020,1299616745209,
> dest=sv4borg234,61020,1299616745224
> 2011-03-09 00:45:35,538 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Starting
> unassignment of region
> usertable,user362822713,1299624789204.1720a98e1a0709e9a401a8eb9d8436bc.
> (offlining)
> ...
>
>
> or there will be a message that it is skipping balancing because there
> are regions in movement already.
>
> Do you see none of the above?
>
> Yes I did see the latter messages in the master log. So I guess the region
is balanced across the cluster.
Actually I was expecting the REAL region data would also be balanced so that
I could ge better I/O balancing.
Now it seems to me that the data rebalancing is done during major compaction
only.


> In the shell you can run the balancer explicitly.
>
> hbase> balance
>
> Watch the master logs while this is happening.  What does it say?
>
> Typing 'balance' gives me invalid command. I am using 0.90.1. Is this
available in newer release?

>
> St.Ack
>
> On Mon, Mar 14, 2011 at 6:27 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> > On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham <bi...@gmail.com>
> wrote:
> >
> >> I hope I'm not hijacking the thread but I'm seeing what I think is a
> >> similar issue. About a week ago I loaded a bunch of data into a newly
> >> created table. It took about an hour and resulted in 12 regions being
> >> created on a single node. (Afterwards I remembered a conversation with
> >> JD where he described this behavior and how you could pre-create at
> >> least N regions where N is your number of nodes to get better
> >> distribution off the bat).
> >>
> >> Some following questions. Do we have to pre-create N regions on
> different
> > nodes to get better distribution? I ask this because I also noticed that
> > HBase prefer to always store new key-values on one node. Now I know
> > that we can do major compactions to rebalance the data. But it would be
> > better if the data could be stored on less-loaded nodes at time it is
> > inserted.
> > This makes I/O more balanced I guess.
> >
> >
> >
> >> Anyway, it's been about a week and all regions for the table are still
> >> on 1 node. I see messages like this in the logs every 5 minutes:
> >>
> >> 2011-03-14 15:59:03,148 INFO
> >> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
> >> servers=4 regions=62 average=15.5 mostloaded=16 leastloaded=16
> >>
> >> It seems the total regions are evenly balanced, but individual tables
> >> are not. Where should I look to troubleshoot why this table's regions
> >> (as well as others) aren't evenly distributed? I'd guess that I can
> >> major compact all tables to fix it, but I'd like to figure out why it
> >> hasn't happened automatically.
> >>
> >> HBase 0.90.0
> >> CDH3b2
> >>
> >> thanks,
> >> Bill
> >>
> >> On Mon, Mar 14, 2011 at 3:31 PM, Weiwei Xiong <xi...@gmail.com>
> wrote:
> >> > I see.  Thanks Ryan.
> >> >
> >> > -- Weiwei
> >> >
> >> > On Mon, Mar 14, 2011 at 3:28 PM, Ryan Rawson <ry...@gmail.com>
> wrote:
> >> >
> >> >> by default runs 1x/day. you can do it manually in the hbase shell by
> >> >> typing:
> >> >>
> >> >> hbase(main):001:0> major_compact "table_name"
> >> >>
> >> >> -ryan
> >> >>
> >> >>
> >> >> On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xi...@gmail.com>
> >> wrote:
> >> >> > Thanks for your info Ryan.
> >> >> > Does HBase do major compaction regularly or do I need to manually
> do
> >> >> this?
> >> >> > If it's automatic, how frequently is it performed?
> >> >> > I am running 1 replication.
> >> >> > Thanks,
> >> >> > -- Weiwei
> >> >> >
> >> >> > On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ry...@gmail.com>
> >> wrote:
> >> >> >>
> >> >> >> HDFS does the data rebalancing, over time as major compactions and
> >> new
> >> >> >> data comes in, files are written first to the local node then to
> >> >> >> remote nodes.
> >> >> >>
> >> >> >> Whats the replication factor you are running?  HDFS on 2 nodes is
> >> >> >> tricky, since you can either choose r=1 (no data protection) or
> r=2
> >> >> >> (all writes go to both nodes).
> >> >> >>
> >> >> >> The sweet spot is above 6 nodes alas.
> >> >> >>
> >> >> >> -ryan
> >> >> >>
> >> >> >> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xi...@gmail.com>
> >> >> wrote:
> >> >> >> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS
> >> >> 0.20.append
> >> >> >> > Thanks,
> >> >> >> > -- Weiwei
> >> >> >> >
> >> >> >> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <
> xiongww@gmail.com>
> >> >> wrote:
> >> >> >> >>
> >> >> >> >> Thanks very much for your replies.
> >> >> >> >> Something was unclear in my previous emails. I had one node
> >> started
> >> >> >> >> first
> >> >> >> >> and another was added in later. And there're already some
> regions
> >> >> >> >> created in
> >> >> >> >> the first started node. Then I started to import more data into
> >> the
> >> >> >> >> same
> >> >> >> >> table and found that it's always the first node that keeps
> serving
> >> >> the
> >> >> >> >> data
> >> >> >> >> writes.
> >> >> >> >> Actually I was expecting that the region data would be
> re-balanced
> >> to
> >> >> >> >> another data node. And I did see in the master log that HBase
> >> master
> >> >> is
> >> >> >> >> trying to unassigning some regions from the overloaded node and
> >> >> >> >> re-assign
> >> >> >> >> them to the less-loaded node. But the real data was never
> >> migrated.
> >> >> >> >> I think I observed the region index and cache rebalancing from
> the
> >> >> >> >> master
> >> >> >> >> log (correct me if I were wrong).  Does anyone know how
> frequently
> >> >> this
> >> >> >> >> happens?
> >> >> >> >> Another question is, does HBase support data and I/O
> rebalancing?
> >> Or
> >> >> I
> >> >> >> >> should rely on HDFS to do data rebalancing? I guess HBase
> should
> >> also
> >> >> >> >> support data rebalancing otherwise every time I restart HBase
> the
> >> >> >> >> regions
> >> >> >> >> will have to be rebalanced again. Will someone tell me how to
> >> >> configure
> >> >> >> >> or
> >> >> >> >> program HBase to do data rebalancing?
> >> >> >> >> Thanks,
> >> >> >> >> -- Weiwei
> >> >> >> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <
> ryanobjc@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >>>
> >> >> >> >>> What version of HBase are you testing?
> >> >> >> >>>
> >> >> >> >>> Is it literally 0 vs N assignments?
> >> >> >> >>>
> >> >> >> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <
> xiongww@gmail.com
> >> >
> >> >> >> >>> wrote:
> >> >> >> >>> > Thanks!
> >> >> >> >>> >
> >> >> >> >>> > I checked the master log and found some info like this:
> >> >> >> >>> > " timestamp ***, INFO
> org.apache.hadoop.hbase.master.HMaster:
> >> >> >> >>> > balance
> >> >> >> >>> > hri=***, src=***, dst=*** "
> >> >> >> >>> >
> >> >> >> >>> > So I assume the balancer is running. There's no failing info
> >> >> there,
> >> >> >> >>> > but
> >> >> >> >>> > I
> >> >> >> >>> > didn't see the regions were actually balanced as the log
> >> states.
> >> >> >> >>> >
> >> >> >> >>> > Is it possible that I have been keeping dumping data into
> the
> >> >> table
> >> >> >> >>> > thus the
> >> >> >> >>> > balancing won't work?
> >> >> >> >>> >
> >> >> >> >>> > Thanks,
> >> >> >> >>> > -- Weiwei
> >> >> >> >>> >
> >> >> >> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net>
> >> wrote:
> >> >> >> >>> >
> >> >> >> >>> >> Check the master log.  See if the load balancer is running
> or
> >> >> not.
> >> >> >> >>> >>  It
> >> >> >> >>> >> usually runs every 5 minutes by default.  It may not run if
> >> >> regions
> >> >> >> >>> >> are transitioning.  It'll log regardless.
> >> >> >> >>> >>
> >> >> >> >>> >> St.Ack
> >> >> >> >>> >>
> >> >> >> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <
> >> >> xiongww@gmail.com>
> >> >> >> >>> >> wrote:
> >> >> >> >>> >> > Hi,
> >> >> >> >>> >> >
> >> >> >> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and
> am
> >> >> trying
> >> >> >> >>> >> > to
> >> >> >> >>> >> > load
> >> >> >> >>> >> > data into my HBase table using HBase client.
> >> >> >> >>> >> >
> >> >> >> >>> >> > The issue bothers me is that the data are always written
> >> into
> >> >> one
> >> >> >> >>> >> > node of
> >> >> >> >>> >> > the cluster, i.e., all the regions of the hbase table are
> on
> >> >> one
> >> >> >> >>> >> > node.
> >> >> >> >>> >> >
> >> >> >> >>> >> > Is there any configuration I need to change for make the
> >> load
> >> >> >> >>> >> > balanced?
> >> >> >> >>> >> >
> >> >> >> >>> >> > Thanks,
> >> >> >> >>> >> > -- w
> >> >> >> >>> >> >
> >> >> >> >>> >>
> >> >> >> >>> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Data is always written to one node

Posted by Stack <st...@duboce.net>.
Data balancing on hdfs is different to region balancing across your
nodes.  Maybe there is a bug in our balancer if there are only two
nodes involved?

If there is nothing to balance, because its' already balanced, it'll
output this:

2011-03-09 00:40:35,537 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
servers=5 regions=1007 average=201.4 mostloaded=202 leastloaded=202

....else you will see:


2011-03-09 00:45:35,538 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance
in 1ms. Moving 1 regions off of 1 overloaded servers onto 0 less
loaded servers
2011-03-09 00:45:35,538 INFO org.apache.hadoop.hbase.master.HMaster:
balance hri=usertable,user362822713,1299624789204.1720a98e1a0709e9a401a8eb9d8436bc.,
src=sv4borg230,61020,1299616745209,
dest=sv4borg234,61020,1299616745224
2011-03-09 00:45:35,538 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Starting
unassignment of region
usertable,user362822713,1299624789204.1720a98e1a0709e9a401a8eb9d8436bc.
(offlining)
...


or there will be a message that it is skipping balancing because there
are regions in movement already.

Do you see none of the above?

In the shell you can run the balancer explicitly.

hbase> balance

Watch the master logs while this is happening.  What does it say?


St.Ack

On Mon, Mar 14, 2011 at 6:27 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham <bi...@gmail.com> wrote:
>
>> I hope I'm not hijacking the thread but I'm seeing what I think is a
>> similar issue. About a week ago I loaded a bunch of data into a newly
>> created table. It took about an hour and resulted in 12 regions being
>> created on a single node. (Afterwards I remembered a conversation with
>> JD where he described this behavior and how you could pre-create at
>> least N regions where N is your number of nodes to get better
>> distribution off the bat).
>>
>> Some following questions. Do we have to pre-create N regions on different
> nodes to get better distribution? I ask this because I also noticed that
> HBase prefer to always store new key-values on one node. Now I know
> that we can do major compactions to rebalance the data. But it would be
> better if the data could be stored on less-loaded nodes at time it is
> inserted.
> This makes I/O more balanced I guess.
>
>
>
>> Anyway, it's been about a week and all regions for the table are still
>> on 1 node. I see messages like this in the logs every 5 minutes:
>>
>> 2011-03-14 15:59:03,148 INFO
>> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
>> servers=4 regions=62 average=15.5 mostloaded=16 leastloaded=16
>>
>> It seems the total regions are evenly balanced, but individual tables
>> are not. Where should I look to troubleshoot why this table's regions
>> (as well as others) aren't evenly distributed? I'd guess that I can
>> major compact all tables to fix it, but I'd like to figure out why it
>> hasn't happened automatically.
>>
>> HBase 0.90.0
>> CDH3b2
>>
>> thanks,
>> Bill
>>
>> On Mon, Mar 14, 2011 at 3:31 PM, Weiwei Xiong <xi...@gmail.com> wrote:
>> > I see.  Thanks Ryan.
>> >
>> > -- Weiwei
>> >
>> > On Mon, Mar 14, 2011 at 3:28 PM, Ryan Rawson <ry...@gmail.com> wrote:
>> >
>> >> by default runs 1x/day. you can do it manually in the hbase shell by
>> >> typing:
>> >>
>> >> hbase(main):001:0> major_compact "table_name"
>> >>
>> >> -ryan
>> >>
>> >>
>> >> On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xi...@gmail.com>
>> wrote:
>> >> > Thanks for your info Ryan.
>> >> > Does HBase do major compaction regularly or do I need to manually do
>> >> this?
>> >> > If it's automatic, how frequently is it performed?
>> >> > I am running 1 replication.
>> >> > Thanks,
>> >> > -- Weiwei
>> >> >
>> >> > On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ry...@gmail.com>
>> wrote:
>> >> >>
>> >> >> HDFS does the data rebalancing, over time as major compactions and
>> new
>> >> >> data comes in, files are written first to the local node then to
>> >> >> remote nodes.
>> >> >>
>> >> >> Whats the replication factor you are running?  HDFS on 2 nodes is
>> >> >> tricky, since you can either choose r=1 (no data protection) or r=2
>> >> >> (all writes go to both nodes).
>> >> >>
>> >> >> The sweet spot is above 6 nodes alas.
>> >> >>
>> >> >> -ryan
>> >> >>
>> >> >> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xi...@gmail.com>
>> >> wrote:
>> >> >> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS
>> >> 0.20.append
>> >> >> > Thanks,
>> >> >> > -- Weiwei
>> >> >> >
>> >> >> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xi...@gmail.com>
>> >> wrote:
>> >> >> >>
>> >> >> >> Thanks very much for your replies.
>> >> >> >> Something was unclear in my previous emails. I had one node
>> started
>> >> >> >> first
>> >> >> >> and another was added in later. And there're already some regions
>> >> >> >> created in
>> >> >> >> the first started node. Then I started to import more data into
>> the
>> >> >> >> same
>> >> >> >> table and found that it's always the first node that keeps serving
>> >> the
>> >> >> >> data
>> >> >> >> writes.
>> >> >> >> Actually I was expecting that the region data would be re-balanced
>> to
>> >> >> >> another data node. And I did see in the master log that HBase
>> master
>> >> is
>> >> >> >> trying to unassigning some regions from the overloaded node and
>> >> >> >> re-assign
>> >> >> >> them to the less-loaded node. But the real data was never
>> migrated.
>> >> >> >> I think I observed the region index and cache rebalancing from the
>> >> >> >> master
>> >> >> >> log (correct me if I were wrong).  Does anyone know how frequently
>> >> this
>> >> >> >> happens?
>> >> >> >> Another question is, does HBase support data and I/O rebalancing?
>> Or
>> >> I
>> >> >> >> should rely on HDFS to do data rebalancing? I guess HBase should
>> also
>> >> >> >> support data rebalancing otherwise every time I restart HBase the
>> >> >> >> regions
>> >> >> >> will have to be rebalanced again. Will someone tell me how to
>> >> configure
>> >> >> >> or
>> >> >> >> program HBase to do data rebalancing?
>> >> >> >> Thanks,
>> >> >> >> -- Weiwei
>> >> >> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com>
>> >> >> >> wrote:
>> >> >> >>>
>> >> >> >>> What version of HBase are you testing?
>> >> >> >>>
>> >> >> >>> Is it literally 0 vs N assignments?
>> >> >> >>>
>> >> >> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xiongww@gmail.com
>> >
>> >> >> >>> wrote:
>> >> >> >>> > Thanks!
>> >> >> >>> >
>> >> >> >>> > I checked the master log and found some info like this:
>> >> >> >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster:
>> >> >> >>> > balance
>> >> >> >>> > hri=***, src=***, dst=*** "
>> >> >> >>> >
>> >> >> >>> > So I assume the balancer is running. There's no failing info
>> >> there,
>> >> >> >>> > but
>> >> >> >>> > I
>> >> >> >>> > didn't see the regions were actually balanced as the log
>> states.
>> >> >> >>> >
>> >> >> >>> > Is it possible that I have been keeping dumping data into the
>> >> table
>> >> >> >>> > thus the
>> >> >> >>> > balancing won't work?
>> >> >> >>> >
>> >> >> >>> > Thanks,
>> >> >> >>> > -- Weiwei
>> >> >> >>> >
>> >> >> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net>
>> wrote:
>> >> >> >>> >
>> >> >> >>> >> Check the master log.  See if the load balancer is running or
>> >> not.
>> >> >> >>> >>  It
>> >> >> >>> >> usually runs every 5 minutes by default.  It may not run if
>> >> regions
>> >> >> >>> >> are transitioning.  It'll log regardless.
>> >> >> >>> >>
>> >> >> >>> >> St.Ack
>> >> >> >>> >>
>> >> >> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <
>> >> xiongww@gmail.com>
>> >> >> >>> >> wrote:
>> >> >> >>> >> > Hi,
>> >> >> >>> >> >
>> >> >> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am
>> >> trying
>> >> >> >>> >> > to
>> >> >> >>> >> > load
>> >> >> >>> >> > data into my HBase table using HBase client.
>> >> >> >>> >> >
>> >> >> >>> >> > The issue bothers me is that the data are always written
>> into
>> >> one
>> >> >> >>> >> > node of
>> >> >> >>> >> > the cluster, i.e., all the regions of the hbase table are on
>> >> one
>> >> >> >>> >> > node.
>> >> >> >>> >> >
>> >> >> >>> >> > Is there any configuration I need to change for make the
>> load
>> >> >> >>> >> > balanced?
>> >> >> >>> >> >
>> >> >> >>> >> > Thanks,
>> >> >> >>> >> > -- w
>> >> >> >>> >> >
>> >> >> >>> >>
>> >> >> >>> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >>
>> >
>>
>

Re: Data is always written to one node

Posted by Weiwei Xiong <xi...@gmail.com>.
On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham <bi...@gmail.com> wrote:

> I hope I'm not hijacking the thread but I'm seeing what I think is a
> similar issue. About a week ago I loaded a bunch of data into a newly
> created table. It took about an hour and resulted in 12 regions being
> created on a single node. (Afterwards I remembered a conversation with
> JD where he described this behavior and how you could pre-create at
> least N regions where N is your number of nodes to get better
> distribution off the bat).
>
> Some following questions. Do we have to pre-create N regions on different
nodes to get better distribution? I ask this because I also noticed that
HBase prefer to always store new key-values on one node. Now I know
that we can do major compactions to rebalance the data. But it would be
better if the data could be stored on less-loaded nodes at time it is
inserted.
This makes I/O more balanced I guess.



> Anyway, it's been about a week and all regions for the table are still
> on 1 node. I see messages like this in the logs every 5 minutes:
>
> 2011-03-14 15:59:03,148 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
> servers=4 regions=62 average=15.5 mostloaded=16 leastloaded=16
>
> It seems the total regions are evenly balanced, but individual tables
> are not. Where should I look to troubleshoot why this table's regions
> (as well as others) aren't evenly distributed? I'd guess that I can
> major compact all tables to fix it, but I'd like to figure out why it
> hasn't happened automatically.
>
> HBase 0.90.0
> CDH3b2
>
> thanks,
> Bill
>
> On Mon, Mar 14, 2011 at 3:31 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> > I see.  Thanks Ryan.
> >
> > -- Weiwei
> >
> > On Mon, Mar 14, 2011 at 3:28 PM, Ryan Rawson <ry...@gmail.com> wrote:
> >
> >> by default runs 1x/day. you can do it manually in the hbase shell by
> >> typing:
> >>
> >> hbase(main):001:0> major_compact "table_name"
> >>
> >> -ryan
> >>
> >>
> >> On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xi...@gmail.com>
> wrote:
> >> > Thanks for your info Ryan.
> >> > Does HBase do major compaction regularly or do I need to manually do
> >> this?
> >> > If it's automatic, how frequently is it performed?
> >> > I am running 1 replication.
> >> > Thanks,
> >> > -- Weiwei
> >> >
> >> > On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ry...@gmail.com>
> wrote:
> >> >>
> >> >> HDFS does the data rebalancing, over time as major compactions and
> new
> >> >> data comes in, files are written first to the local node then to
> >> >> remote nodes.
> >> >>
> >> >> Whats the replication factor you are running?  HDFS on 2 nodes is
> >> >> tricky, since you can either choose r=1 (no data protection) or r=2
> >> >> (all writes go to both nodes).
> >> >>
> >> >> The sweet spot is above 6 nodes alas.
> >> >>
> >> >> -ryan
> >> >>
> >> >> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xi...@gmail.com>
> >> wrote:
> >> >> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS
> >> 0.20.append
> >> >> > Thanks,
> >> >> > -- Weiwei
> >> >> >
> >> >> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xi...@gmail.com>
> >> wrote:
> >> >> >>
> >> >> >> Thanks very much for your replies.
> >> >> >> Something was unclear in my previous emails. I had one node
> started
> >> >> >> first
> >> >> >> and another was added in later. And there're already some regions
> >> >> >> created in
> >> >> >> the first started node. Then I started to import more data into
> the
> >> >> >> same
> >> >> >> table and found that it's always the first node that keeps serving
> >> the
> >> >> >> data
> >> >> >> writes.
> >> >> >> Actually I was expecting that the region data would be re-balanced
> to
> >> >> >> another data node. And I did see in the master log that HBase
> master
> >> is
> >> >> >> trying to unassigning some regions from the overloaded node and
> >> >> >> re-assign
> >> >> >> them to the less-loaded node. But the real data was never
> migrated.
> >> >> >> I think I observed the region index and cache rebalancing from the
> >> >> >> master
> >> >> >> log (correct me if I were wrong).  Does anyone know how frequently
> >> this
> >> >> >> happens?
> >> >> >> Another question is, does HBase support data and I/O rebalancing?
> Or
> >> I
> >> >> >> should rely on HDFS to do data rebalancing? I guess HBase should
> also
> >> >> >> support data rebalancing otherwise every time I restart HBase the
> >> >> >> regions
> >> >> >> will have to be rebalanced again. Will someone tell me how to
> >> configure
> >> >> >> or
> >> >> >> program HBase to do data rebalancing?
> >> >> >> Thanks,
> >> >> >> -- Weiwei
> >> >> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> What version of HBase are you testing?
> >> >> >>>
> >> >> >>> Is it literally 0 vs N assignments?
> >> >> >>>
> >> >> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xiongww@gmail.com
> >
> >> >> >>> wrote:
> >> >> >>> > Thanks!
> >> >> >>> >
> >> >> >>> > I checked the master log and found some info like this:
> >> >> >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster:
> >> >> >>> > balance
> >> >> >>> > hri=***, src=***, dst=*** "
> >> >> >>> >
> >> >> >>> > So I assume the balancer is running. There's no failing info
> >> there,
> >> >> >>> > but
> >> >> >>> > I
> >> >> >>> > didn't see the regions were actually balanced as the log
> states.
> >> >> >>> >
> >> >> >>> > Is it possible that I have been keeping dumping data into the
> >> table
> >> >> >>> > thus the
> >> >> >>> > balancing won't work?
> >> >> >>> >
> >> >> >>> > Thanks,
> >> >> >>> > -- Weiwei
> >> >> >>> >
> >> >> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net>
> wrote:
> >> >> >>> >
> >> >> >>> >> Check the master log.  See if the load balancer is running or
> >> not.
> >> >> >>> >>  It
> >> >> >>> >> usually runs every 5 minutes by default.  It may not run if
> >> regions
> >> >> >>> >> are transitioning.  It'll log regardless.
> >> >> >>> >>
> >> >> >>> >> St.Ack
> >> >> >>> >>
> >> >> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <
> >> xiongww@gmail.com>
> >> >> >>> >> wrote:
> >> >> >>> >> > Hi,
> >> >> >>> >> >
> >> >> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am
> >> trying
> >> >> >>> >> > to
> >> >> >>> >> > load
> >> >> >>> >> > data into my HBase table using HBase client.
> >> >> >>> >> >
> >> >> >>> >> > The issue bothers me is that the data are always written
> into
> >> one
> >> >> >>> >> > node of
> >> >> >>> >> > the cluster, i.e., all the regions of the hbase table are on
> >> one
> >> >> >>> >> > node.
> >> >> >>> >> >
> >> >> >>> >> > Is there any configuration I need to change for make the
> load
> >> >> >>> >> > balanced?
> >> >> >>> >> >
> >> >> >>> >> > Thanks,
> >> >> >>> >> > -- w
> >> >> >>> >> >
> >> >> >>> >>
> >> >> >>> >
> >> >> >>
> >> >> >
> >> >> >
> >> >
> >> >
> >>
> >
>

Re: Data is always written to one node

Posted by Bill Graham <bi...@gmail.com>.
On Mon, Mar 14, 2011 at 8:54 PM, Stack <st...@duboce.net> wrote:
> On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham <bi...@gmail.com> wrote:
>> Anyway, it's been about a week and all regions for the table are still
>> on 1 node. I see messages like this in the logs every 5 minutes:
>>
>> 2011-03-14 15:59:03,148 INFO
>> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
>> servers=4 regions=62 average=15.5 mostloaded=16 leastloaded=16
>>
>
> That says that you have 4 servers and a total of 62 regions.  The
> regions are spread around the cluster at about 16 regions per server.
> Is this not the case?

Yes, that's the case. I was just including that point out that the
cluster is balanced w.r.t. regions per RS.

>
>
>> It seems the total regions are evenly balanced, but individual tables
>> are not.
>
> We don't pay attention to the region a table comes from, just the
> regions themselves.
>
>  HBASE-3586  which will make a showing in 0.90.2 should help.  It goes
> a bit of footwork to make sure we're more random than we currently
> are.

Thanks, that's exactly the issue I'm seeing.

>
> St.Ack
>

Re: Data is always written to one node

Posted by Stack <st...@duboce.net>.
On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham <bi...@gmail.com> wrote:
> Anyway, it's been about a week and all regions for the table are still
> on 1 node. I see messages like this in the logs every 5 minutes:
>
> 2011-03-14 15:59:03,148 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
> servers=4 regions=62 average=15.5 mostloaded=16 leastloaded=16
>

That says that you have 4 servers and a total of 62 regions.  The
regions are spread around the cluster at about 16 regions per server.
Is this not the case?


> It seems the total regions are evenly balanced, but individual tables
> are not.

We don't pay attention to the region a table comes from, just the
regions themselves.

 HBASE-3586  which will make a showing in 0.90.2 should help.  It goes
a bit of footwork to make sure we're more random than we currently
are.

St.Ack

Re: Data is always written to one node

Posted by Bill Graham <bi...@gmail.com>.
I hope I'm not hijacking the thread but I'm seeing what I think is a
similar issue. About a week ago I loaded a bunch of data into a newly
created table. It took about an hour and resulted in 12 regions being
created on a single node. (Afterwards I remembered a conversation with
JD where he described this behavior and how you could pre-create at
least N regions where N is your number of nodes to get better
distribution off the bat).

Anyway, it's been about a week and all regions for the table are still
on 1 node. I see messages like this in the logs every 5 minutes:

2011-03-14 15:59:03,148 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
servers=4 regions=62 average=15.5 mostloaded=16 leastloaded=16

It seems the total regions are evenly balanced, but individual tables
are not. Where should I look to troubleshoot why this table's regions
(as well as others) aren't evenly distributed? I'd guess that I can
major compact all tables to fix it, but I'd like to figure out why it
hasn't happened automatically.

HBase 0.90.0
CDH3b2

thanks,
Bill

On Mon, Mar 14, 2011 at 3:31 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> I see.  Thanks Ryan.
>
> -- Weiwei
>
> On Mon, Mar 14, 2011 at 3:28 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> by default runs 1x/day. you can do it manually in the hbase shell by
>> typing:
>>
>> hbase(main):001:0> major_compact "table_name"
>>
>> -ryan
>>
>>
>> On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xi...@gmail.com> wrote:
>> > Thanks for your info Ryan.
>> > Does HBase do major compaction regularly or do I need to manually do
>> this?
>> > If it's automatic, how frequently is it performed?
>> > I am running 1 replication.
>> > Thanks,
>> > -- Weiwei
>> >
>> > On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ry...@gmail.com> wrote:
>> >>
>> >> HDFS does the data rebalancing, over time as major compactions and new
>> >> data comes in, files are written first to the local node then to
>> >> remote nodes.
>> >>
>> >> Whats the replication factor you are running?  HDFS on 2 nodes is
>> >> tricky, since you can either choose r=1 (no data protection) or r=2
>> >> (all writes go to both nodes).
>> >>
>> >> The sweet spot is above 6 nodes alas.
>> >>
>> >> -ryan
>> >>
>> >> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xi...@gmail.com>
>> wrote:
>> >> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS
>> 0.20.append
>> >> > Thanks,
>> >> > -- Weiwei
>> >> >
>> >> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xi...@gmail.com>
>> wrote:
>> >> >>
>> >> >> Thanks very much for your replies.
>> >> >> Something was unclear in my previous emails. I had one node started
>> >> >> first
>> >> >> and another was added in later. And there're already some regions
>> >> >> created in
>> >> >> the first started node. Then I started to import more data into the
>> >> >> same
>> >> >> table and found that it's always the first node that keeps serving
>> the
>> >> >> data
>> >> >> writes.
>> >> >> Actually I was expecting that the region data would be re-balanced to
>> >> >> another data node. And I did see in the master log that HBase master
>> is
>> >> >> trying to unassigning some regions from the overloaded node and
>> >> >> re-assign
>> >> >> them to the less-loaded node. But the real data was never migrated.
>> >> >> I think I observed the region index and cache rebalancing from the
>> >> >> master
>> >> >> log (correct me if I were wrong).  Does anyone know how frequently
>> this
>> >> >> happens?
>> >> >> Another question is, does HBase support data and I/O rebalancing? Or
>> I
>> >> >> should rely on HDFS to do data rebalancing? I guess HBase should also
>> >> >> support data rebalancing otherwise every time I restart HBase the
>> >> >> regions
>> >> >> will have to be rebalanced again. Will someone tell me how to
>> configure
>> >> >> or
>> >> >> program HBase to do data rebalancing?
>> >> >> Thanks,
>> >> >> -- Weiwei
>> >> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> What version of HBase are you testing?
>> >> >>>
>> >> >>> Is it literally 0 vs N assignments?
>> >> >>>
>> >> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xi...@gmail.com>
>> >> >>> wrote:
>> >> >>> > Thanks!
>> >> >>> >
>> >> >>> > I checked the master log and found some info like this:
>> >> >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster:
>> >> >>> > balance
>> >> >>> > hri=***, src=***, dst=*** "
>> >> >>> >
>> >> >>> > So I assume the balancer is running. There's no failing info
>> there,
>> >> >>> > but
>> >> >>> > I
>> >> >>> > didn't see the regions were actually balanced as the log states.
>> >> >>> >
>> >> >>> > Is it possible that I have been keeping dumping data into the
>> table
>> >> >>> > thus the
>> >> >>> > balancing won't work?
>> >> >>> >
>> >> >>> > Thanks,
>> >> >>> > -- Weiwei
>> >> >>> >
>> >> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
>> >> >>> >
>> >> >>> >> Check the master log.  See if the load balancer is running or
>> not.
>> >> >>> >>  It
>> >> >>> >> usually runs every 5 minutes by default.  It may not run if
>> regions
>> >> >>> >> are transitioning.  It'll log regardless.
>> >> >>> >>
>> >> >>> >> St.Ack
>> >> >>> >>
>> >> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <
>> xiongww@gmail.com>
>> >> >>> >> wrote:
>> >> >>> >> > Hi,
>> >> >>> >> >
>> >> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am
>> trying
>> >> >>> >> > to
>> >> >>> >> > load
>> >> >>> >> > data into my HBase table using HBase client.
>> >> >>> >> >
>> >> >>> >> > The issue bothers me is that the data are always written into
>> one
>> >> >>> >> > node of
>> >> >>> >> > the cluster, i.e., all the regions of the hbase table are on
>> one
>> >> >>> >> > node.
>> >> >>> >> >
>> >> >>> >> > Is there any configuration I need to change for make the load
>> >> >>> >> > balanced?
>> >> >>> >> >
>> >> >>> >> > Thanks,
>> >> >>> >> > -- w
>> >> >>> >> >
>> >> >>> >>
>> >> >>> >
>> >> >>
>> >> >
>> >> >
>> >
>> >
>>
>

Re: Data is always written to one node

Posted by Weiwei Xiong <xi...@gmail.com>.
I see.  Thanks Ryan.

-- Weiwei

On Mon, Mar 14, 2011 at 3:28 PM, Ryan Rawson <ry...@gmail.com> wrote:

> by default runs 1x/day. you can do it manually in the hbase shell by
> typing:
>
> hbase(main):001:0> major_compact "table_name"
>
> -ryan
>
>
> On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> > Thanks for your info Ryan.
> > Does HBase do major compaction regularly or do I need to manually do
> this?
> > If it's automatic, how frequently is it performed?
> > I am running 1 replication.
> > Thanks,
> > -- Weiwei
> >
> > On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ry...@gmail.com> wrote:
> >>
> >> HDFS does the data rebalancing, over time as major compactions and new
> >> data comes in, files are written first to the local node then to
> >> remote nodes.
> >>
> >> Whats the replication factor you are running?  HDFS on 2 nodes is
> >> tricky, since you can either choose r=1 (no data protection) or r=2
> >> (all writes go to both nodes).
> >>
> >> The sweet spot is above 6 nodes alas.
> >>
> >> -ryan
> >>
> >> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xi...@gmail.com>
> wrote:
> >> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS
> 0.20.append
> >> > Thanks,
> >> > -- Weiwei
> >> >
> >> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xi...@gmail.com>
> wrote:
> >> >>
> >> >> Thanks very much for your replies.
> >> >> Something was unclear in my previous emails. I had one node started
> >> >> first
> >> >> and another was added in later. And there're already some regions
> >> >> created in
> >> >> the first started node. Then I started to import more data into the
> >> >> same
> >> >> table and found that it's always the first node that keeps serving
> the
> >> >> data
> >> >> writes.
> >> >> Actually I was expecting that the region data would be re-balanced to
> >> >> another data node. And I did see in the master log that HBase master
> is
> >> >> trying to unassigning some regions from the overloaded node and
> >> >> re-assign
> >> >> them to the less-loaded node. But the real data was never migrated.
> >> >> I think I observed the region index and cache rebalancing from the
> >> >> master
> >> >> log (correct me if I were wrong).  Does anyone know how frequently
> this
> >> >> happens?
> >> >> Another question is, does HBase support data and I/O rebalancing? Or
> I
> >> >> should rely on HDFS to do data rebalancing? I guess HBase should also
> >> >> support data rebalancing otherwise every time I restart HBase the
> >> >> regions
> >> >> will have to be rebalanced again. Will someone tell me how to
> configure
> >> >> or
> >> >> program HBase to do data rebalancing?
> >> >> Thanks,
> >> >> -- Weiwei
> >> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> What version of HBase are you testing?
> >> >>>
> >> >>> Is it literally 0 vs N assignments?
> >> >>>
> >> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xi...@gmail.com>
> >> >>> wrote:
> >> >>> > Thanks!
> >> >>> >
> >> >>> > I checked the master log and found some info like this:
> >> >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster:
> >> >>> > balance
> >> >>> > hri=***, src=***, dst=*** "
> >> >>> >
> >> >>> > So I assume the balancer is running. There's no failing info
> there,
> >> >>> > but
> >> >>> > I
> >> >>> > didn't see the regions were actually balanced as the log states.
> >> >>> >
> >> >>> > Is it possible that I have been keeping dumping data into the
> table
> >> >>> > thus the
> >> >>> > balancing won't work?
> >> >>> >
> >> >>> > Thanks,
> >> >>> > -- Weiwei
> >> >>> >
> >> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
> >> >>> >
> >> >>> >> Check the master log.  See if the load balancer is running or
> not.
> >> >>> >>  It
> >> >>> >> usually runs every 5 minutes by default.  It may not run if
> regions
> >> >>> >> are transitioning.  It'll log regardless.
> >> >>> >>
> >> >>> >> St.Ack
> >> >>> >>
> >> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <
> xiongww@gmail.com>
> >> >>> >> wrote:
> >> >>> >> > Hi,
> >> >>> >> >
> >> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am
> trying
> >> >>> >> > to
> >> >>> >> > load
> >> >>> >> > data into my HBase table using HBase client.
> >> >>> >> >
> >> >>> >> > The issue bothers me is that the data are always written into
> one
> >> >>> >> > node of
> >> >>> >> > the cluster, i.e., all the regions of the hbase table are on
> one
> >> >>> >> > node.
> >> >>> >> >
> >> >>> >> > Is there any configuration I need to change for make the load
> >> >>> >> > balanced?
> >> >>> >> >
> >> >>> >> > Thanks,
> >> >>> >> > -- w
> >> >>> >> >
> >> >>> >>
> >> >>> >
> >> >>
> >> >
> >> >
> >
> >
>

Re: Data is always written to one node

Posted by Ryan Rawson <ry...@gmail.com>.
by default runs 1x/day. you can do it manually in the hbase shell by typing:

hbase(main):001:0> major_compact "table_name"

-ryan


On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> Thanks for your info Ryan.
> Does HBase do major compaction regularly or do I need to manually do this?
> If it's automatic, how frequently is it performed?
> I am running 1 replication.
> Thanks,
> -- Weiwei
>
> On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>
>> HDFS does the data rebalancing, over time as major compactions and new
>> data comes in, files are written first to the local node then to
>> remote nodes.
>>
>> Whats the replication factor you are running?  HDFS on 2 nodes is
>> tricky, since you can either choose r=1 (no data protection) or r=2
>> (all writes go to both nodes).
>>
>> The sweet spot is above 6 nodes alas.
>>
>> -ryan
>>
>> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xi...@gmail.com> wrote:
>> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS 0.20.append
>> > Thanks,
>> > -- Weiwei
>> >
>> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xi...@gmail.com> wrote:
>> >>
>> >> Thanks very much for your replies.
>> >> Something was unclear in my previous emails. I had one node started
>> >> first
>> >> and another was added in later. And there're already some regions
>> >> created in
>> >> the first started node. Then I started to import more data into the
>> >> same
>> >> table and found that it's always the first node that keeps serving the
>> >> data
>> >> writes.
>> >> Actually I was expecting that the region data would be re-balanced to
>> >> another data node. And I did see in the master log that HBase master is
>> >> trying to unassigning some regions from the overloaded node and
>> >> re-assign
>> >> them to the less-loaded node. But the real data was never migrated.
>> >> I think I observed the region index and cache rebalancing from the
>> >> master
>> >> log (correct me if I were wrong).  Does anyone know how frequently this
>> >> happens?
>> >> Another question is, does HBase support data and I/O rebalancing? Or I
>> >> should rely on HDFS to do data rebalancing? I guess HBase should also
>> >> support data rebalancing otherwise every time I restart HBase the
>> >> regions
>> >> will have to be rebalanced again. Will someone tell me how to configure
>> >> or
>> >> program HBase to do data rebalancing?
>> >> Thanks,
>> >> -- Weiwei
>> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com>
>> >> wrote:
>> >>>
>> >>> What version of HBase are you testing?
>> >>>
>> >>> Is it literally 0 vs N assignments?
>> >>>
>> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xi...@gmail.com>
>> >>> wrote:
>> >>> > Thanks!
>> >>> >
>> >>> > I checked the master log and found some info like this:
>> >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster:
>> >>> > balance
>> >>> > hri=***, src=***, dst=*** "
>> >>> >
>> >>> > So I assume the balancer is running. There's no failing info there,
>> >>> > but
>> >>> > I
>> >>> > didn't see the regions were actually balanced as the log states.
>> >>> >
>> >>> > Is it possible that I have been keeping dumping data into the table
>> >>> > thus the
>> >>> > balancing won't work?
>> >>> >
>> >>> > Thanks,
>> >>> > -- Weiwei
>> >>> >
>> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
>> >>> >
>> >>> >> Check the master log.  See if the load balancer is running or not.
>> >>> >>  It
>> >>> >> usually runs every 5 minutes by default.  It may not run if regions
>> >>> >> are transitioning.  It'll log regardless.
>> >>> >>
>> >>> >> St.Ack
>> >>> >>
>> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xi...@gmail.com>
>> >>> >> wrote:
>> >>> >> > Hi,
>> >>> >> >
>> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am trying
>> >>> >> > to
>> >>> >> > load
>> >>> >> > data into my HBase table using HBase client.
>> >>> >> >
>> >>> >> > The issue bothers me is that the data are always written into one
>> >>> >> > node of
>> >>> >> > the cluster, i.e., all the regions of the hbase table are on one
>> >>> >> > node.
>> >>> >> >
>> >>> >> > Is there any configuration I need to change for make the load
>> >>> >> > balanced?
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> > -- w
>> >>> >> >
>> >>> >>
>> >>> >
>> >>
>> >
>> >
>
>

Re: Data is always written to one node

Posted by Weiwei Xiong <xi...@gmail.com>.
Thanks for your info Ryan.

Does HBase do major compaction regularly or do I need to manually do this?
If it's automatic, how frequently is it performed?

I am running 1 replication.

Thanks,
-- Weiwei

On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ry...@gmail.com> wrote:

> HDFS does the data rebalancing, over time as major compactions and new
> data comes in, files are written first to the local node then to
> remote nodes.
>
> Whats the replication factor you are running?  HDFS on 2 nodes is
> tricky, since you can either choose r=1 (no data protection) or r=2
> (all writes go to both nodes).
>
> The sweet spot is above 6 nodes alas.
>
> -ryan
>
> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> > Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS 0.20.append
> > Thanks,
> > -- Weiwei
> >
> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> >>
> >> Thanks very much for your replies.
> >> Something was unclear in my previous emails. I had one node started
> first
> >> and another was added in later. And there're already some regions
> created in
> >> the first started node. Then I started to import more data into the same
> >> table and found that it's always the first node that keeps serving the
> data
> >> writes.
> >> Actually I was expecting that the region data would be re-balanced to
> >> another data node. And I did see in the master log that HBase master is
> >> trying to unassigning some regions from the overloaded node and
> re-assign
> >> them to the less-loaded node. But the real data was never migrated.
> >> I think I observed the region index and cache rebalancing from the
> master
> >> log (correct me if I were wrong).  Does anyone know how frequently this
> >> happens?
> >> Another question is, does HBase support data and I/O rebalancing? Or I
> >> should rely on HDFS to do data rebalancing? I guess HBase should also
> >> support data rebalancing otherwise every time I restart HBase the
> regions
> >> will have to be rebalanced again. Will someone tell me how to configure
> or
> >> program HBase to do data rebalancing?
> >> Thanks,
> >> -- Weiwei
> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com>
> wrote:
> >>>
> >>> What version of HBase are you testing?
> >>>
> >>> Is it literally 0 vs N assignments?
> >>>
> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xi...@gmail.com>
> wrote:
> >>> > Thanks!
> >>> >
> >>> > I checked the master log and found some info like this:
> >>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance
> >>> > hri=***, src=***, dst=*** "
> >>> >
> >>> > So I assume the balancer is running. There's no failing info there,
> but
> >>> > I
> >>> > didn't see the regions were actually balanced as the log states.
> >>> >
> >>> > Is it possible that I have been keeping dumping data into the table
> >>> > thus the
> >>> > balancing won't work?
> >>> >
> >>> > Thanks,
> >>> > -- Weiwei
> >>> >
> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
> >>> >
> >>> >> Check the master log.  See if the load balancer is running or not.
>  It
> >>> >> usually runs every 5 minutes by default.  It may not run if regions
> >>> >> are transitioning.  It'll log regardless.
> >>> >>
> >>> >> St.Ack
> >>> >>
> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xi...@gmail.com>
> >>> >> wrote:
> >>> >> > Hi,
> >>> >> >
> >>> >> > I recently set up a 2-node Hadoop and HBase cluster and am trying
> to
> >>> >> > load
> >>> >> > data into my HBase table using HBase client.
> >>> >> >
> >>> >> > The issue bothers me is that the data are always written into one
> >>> >> > node of
> >>> >> > the cluster, i.e., all the regions of the hbase table are on one
> >>> >> > node.
> >>> >> >
> >>> >> > Is there any configuration I need to change for make the load
> >>> >> > balanced?
> >>> >> >
> >>> >> > Thanks,
> >>> >> > -- w
> >>> >> >
> >>> >>
> >>> >
> >>
> >
> >
>

Re: Data is always written to one node

Posted by Ryan Rawson <ry...@gmail.com>.
HDFS does the data rebalancing, over time as major compactions and new
data comes in, files are written first to the local node then to
remote nodes.

Whats the replication factor you are running?  HDFS on 2 nodes is
tricky, since you can either choose r=1 (no data protection) or r=2
(all writes go to both nodes).

The sweet spot is above 6 nodes alas.

-ryan

On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS 0.20.append
> Thanks,
> -- Weiwei
>
> On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xi...@gmail.com> wrote:
>>
>> Thanks very much for your replies.
>> Something was unclear in my previous emails. I had one node started first
>> and another was added in later. And there're already some regions created in
>> the first started node. Then I started to import more data into the same
>> table and found that it's always the first node that keeps serving the data
>> writes.
>> Actually I was expecting that the region data would be re-balanced to
>> another data node. And I did see in the master log that HBase master is
>> trying to unassigning some regions from the overloaded node and re-assign
>> them to the less-loaded node. But the real data was never migrated.
>> I think I observed the region index and cache rebalancing from the master
>> log (correct me if I were wrong).  Does anyone know how frequently this
>> happens?
>> Another question is, does HBase support data and I/O rebalancing? Or I
>> should rely on HDFS to do data rebalancing? I guess HBase should also
>> support data rebalancing otherwise every time I restart HBase the regions
>> will have to be rebalanced again. Will someone tell me how to configure or
>> program HBase to do data rebalancing?
>> Thanks,
>> -- Weiwei
>> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com> wrote:
>>>
>>> What version of HBase are you testing?
>>>
>>> Is it literally 0 vs N assignments?
>>>
>>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xi...@gmail.com> wrote:
>>> > Thanks!
>>> >
>>> > I checked the master log and found some info like this:
>>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance
>>> > hri=***, src=***, dst=*** "
>>> >
>>> > So I assume the balancer is running. There's no failing info there, but
>>> > I
>>> > didn't see the regions were actually balanced as the log states.
>>> >
>>> > Is it possible that I have been keeping dumping data into the table
>>> > thus the
>>> > balancing won't work?
>>> >
>>> > Thanks,
>>> > -- Weiwei
>>> >
>>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
>>> >
>>> >> Check the master log.  See if the load balancer is running or not.  It
>>> >> usually runs every 5 minutes by default.  It may not run if regions
>>> >> are transitioning.  It'll log regardless.
>>> >>
>>> >> St.Ack
>>> >>
>>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xi...@gmail.com>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I recently set up a 2-node Hadoop and HBase cluster and am trying to
>>> >> > load
>>> >> > data into my HBase table using HBase client.
>>> >> >
>>> >> > The issue bothers me is that the data are always written into one
>>> >> > node of
>>> >> > the cluster, i.e., all the regions of the hbase table are on one
>>> >> > node.
>>> >> >
>>> >> > Is there any configuration I need to change for make the load
>>> >> > balanced?
>>> >> >
>>> >> > Thanks,
>>> >> > -- w
>>> >> >
>>> >>
>>> >
>>
>
>

Re: Data is always written to one node

Posted by Weiwei Xiong <xi...@gmail.com>.
Sorry I forgot to mention. I am using HBase 0.90.1 over HDFS 0.20.append

Thanks,
-- Weiwei

On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <xi...@gmail.com> wrote:

> Thanks very much for your replies.
>
> Something was unclear in my previous emails. I had one node started first
> and another was added in later. And there're already some regions created in
> the first started node. Then I started to import more data into the same
> table and found that it's always the first node that keeps serving the data
> writes.
>
> Actually I was expecting that the region data would be re-balanced to
> another data node. And I did see in the master log that HBase master is
> trying to unassigning some regions from the overloaded node and re-assign
> them to the less-loaded node. But the real data was never migrated.
>
> I think I observed the region index and cache rebalancing from the master
> log (correct me if I were wrong).  Does anyone know how frequently this
> happens?
>
> Another question is, does HBase support data and I/O rebalancing? Or I
> should rely on HDFS to do data rebalancing? I guess HBase should also
> support data rebalancing otherwise every time I restart HBase the regions
> will have to be rebalanced again. Will someone tell me how to configure or
> program HBase to do data rebalancing?
>
> Thanks,
> -- Weiwei
>
> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> What version of HBase are you testing?
>>
>> Is it literally 0 vs N assignments?
>>
>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xi...@gmail.com> wrote:
>> > Thanks!
>> >
>> > I checked the master log and found some info like this:
>> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance
>> > hri=***, src=***, dst=*** "
>> >
>> > So I assume the balancer is running. There's no failing info there, but
>> I
>> > didn't see the regions were actually balanced as the log states.
>> >
>> > Is it possible that I have been keeping dumping data into the table thus
>> the
>> > balancing won't work?
>> >
>> > Thanks,
>> > -- Weiwei
>> >
>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
>> >
>> >> Check the master log.  See if the load balancer is running or not.  It
>> >> usually runs every 5 minutes by default.  It may not run if regions
>> >> are transitioning.  It'll log regardless.
>> >>
>> >> St.Ack
>> >>
>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xi...@gmail.com>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > I recently set up a 2-node Hadoop and HBase cluster and am trying to
>> load
>> >> > data into my HBase table using HBase client.
>> >> >
>> >> > The issue bothers me is that the data are always written into one
>> node of
>> >> > the cluster, i.e., all the regions of the hbase table are on one
>> node.
>> >> >
>> >> > Is there any configuration I need to change for make the load
>> balanced?
>> >> >
>> >> > Thanks,
>> >> > -- w
>> >> >
>> >>
>> >
>>
>
>

Re: Data is always written to one node

Posted by Weiwei Xiong <xi...@gmail.com>.
Thanks very much for your replies.

Something was unclear in my previous emails. I had one node started first
and another was added in later. And there're already some regions created in
the first started node. Then I started to import more data into the same
table and found that it's always the first node that keeps serving the data
writes.

Actually I was expecting that the region data would be re-balanced to
another data node. And I did see in the master log that HBase master is
trying to unassigning some regions from the overloaded node and re-assign
them to the less-loaded node. But the real data was never migrated.

I think I observed the region index and cache rebalancing from the master
log (correct me if I were wrong).  Does anyone know how frequently this
happens?

Another question is, does HBase support data and I/O rebalancing? Or I
should rely on HDFS to do data rebalancing? I guess HBase should also
support data rebalancing otherwise every time I restart HBase the regions
will have to be rebalanced again. Will someone tell me how to configure or
program HBase to do data rebalancing?

Thanks,
-- Weiwei

On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <ry...@gmail.com> wrote:

> What version of HBase are you testing?
>
> Is it literally 0 vs N assignments?
>
> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> > Thanks!
> >
> > I checked the master log and found some info like this:
> > " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance
> > hri=***, src=***, dst=*** "
> >
> > So I assume the balancer is running. There's no failing info there, but I
> > didn't see the regions were actually balanced as the log states.
> >
> > Is it possible that I have been keeping dumping data into the table thus
> the
> > balancing won't work?
> >
> > Thanks,
> > -- Weiwei
> >
> > On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
> >
> >> Check the master log.  See if the load balancer is running or not.  It
> >> usually runs every 5 minutes by default.  It may not run if regions
> >> are transitioning.  It'll log regardless.
> >>
> >> St.Ack
> >>
> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xi...@gmail.com>
> wrote:
> >> > Hi,
> >> >
> >> > I recently set up a 2-node Hadoop and HBase cluster and am trying to
> load
> >> > data into my HBase table using HBase client.
> >> >
> >> > The issue bothers me is that the data are always written into one node
> of
> >> > the cluster, i.e., all the regions of the hbase table are on one node.
> >> >
> >> > Is there any configuration I need to change for make the load
> balanced?
> >> >
> >> > Thanks,
> >> > -- w
> >> >
> >>
> >
>

Re: Data is always written to one node

Posted by Ryan Rawson <ry...@gmail.com>.
What version of HBase are you testing?

Is it literally 0 vs N assignments?

On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> Thanks!
>
> I checked the master log and found some info like this:
> " timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance
> hri=***, src=***, dst=*** "
>
> So I assume the balancer is running. There's no failing info there, but I
> didn't see the regions were actually balanced as the log states.
>
> Is it possible that I have been keeping dumping data into the table thus the
> balancing won't work?
>
> Thanks,
> -- Weiwei
>
> On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:
>
>> Check the master log.  See if the load balancer is running or not.  It
>> usually runs every 5 minutes by default.  It may not run if regions
>> are transitioning.  It'll log regardless.
>>
>> St.Ack
>>
>> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xi...@gmail.com> wrote:
>> > Hi,
>> >
>> > I recently set up a 2-node Hadoop and HBase cluster and am trying to load
>> > data into my HBase table using HBase client.
>> >
>> > The issue bothers me is that the data are always written into one node of
>> > the cluster, i.e., all the regions of the hbase table are on one node.
>> >
>> > Is there any configuration I need to change for make the load balanced?
>> >
>> > Thanks,
>> > -- w
>> >
>>
>

Re: Data is always written to one node

Posted by Weiwei Xiong <xi...@gmail.com>.
Thanks!

I checked the master log and found some info like this:
" timestamp ***, INFO org.apache.hadoop.hbase.master.HMaster: balance
hri=***, src=***, dst=*** "

So I assume the balancer is running. There's no failing info there, but I
didn't see the regions were actually balanced as the log states.

Is it possible that I have been keeping dumping data into the table thus the
balancing won't work?

Thanks,
-- Weiwei

On Mon, Mar 14, 2011 at 12:15 PM, Stack <st...@duboce.net> wrote:

> Check the master log.  See if the load balancer is running or not.  It
> usually runs every 5 minutes by default.  It may not run if regions
> are transitioning.  It'll log regardless.
>
> St.Ack
>
> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xi...@gmail.com> wrote:
> > Hi,
> >
> > I recently set up a 2-node Hadoop and HBase cluster and am trying to load
> > data into my HBase table using HBase client.
> >
> > The issue bothers me is that the data are always written into one node of
> > the cluster, i.e., all the regions of the hbase table are on one node.
> >
> > Is there any configuration I need to change for make the load balanced?
> >
> > Thanks,
> > -- w
> >
>

Re: Data is always written to one node

Posted by Stack <st...@duboce.net>.
Check the master log.  See if the load balancer is running or not.  It
usually runs every 5 minutes by default.  It may not run if regions
are transitioning.  It'll log regardless.

St.Ack

On Mon, Mar 14, 2011 at 10:50 AM, Weiwei Xiong <xi...@gmail.com> wrote:
> Hi,
>
> I recently set up a 2-node Hadoop and HBase cluster and am trying to load
> data into my HBase table using HBase client.
>
> The issue bothers me is that the data are always written into one node of
> the cluster, i.e., all the regions of the hbase table are on one node.
>
> Is there any configuration I need to change for make the load balanced?
>
> Thanks,
> -- w
>