You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Tao Xiao <xi...@gmail.com> on 2014/04/15 13:40:06 UTC

All regions stay on two nodes out of 18 nodes

I am using HDP 2.0.6, which has 18 nodes(region servers). One of my HBase
tables has 50 regions but I found that the 50 regions all stay in just two
nodes, not spread evenly in the 18 nodes. I did not pre-create splits so
this table was gradually split into 50 regions itself.

I'd like to know why all the regions stay in just two nodes, not the 18
nodes of the cluster, and how to spread the regions evenly across all the
region servers. Thanks.

Re: All regions stay on two nodes out of 18 nodes

Posted by Ted Yu <yu...@gmail.com>.

The message cited is from OpenRegionHandler
#tryTransitionFromOpeningToFailedOpen()

'version 1' means the OpenRegionHandler instance was expecting version 1 in
corresponding znode.

Cheers


On Wed, Apr 16, 2014 at 10:29 PM, Tao Xiao <xi...@gmail.com> wrote:

> BTW, the region server reported:
>
> 2014-04-16 11:30:31,890 INFO  [RS_OPEN_REGION-b05:60020-0]
> handler.OpenRegionHandler: Opening of region {ENCODED =>
> 6886ac98a71a47dc78a9e0ab5b3f07cd, NAME =>
> 'E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cd.',
> STARTKEY => '', ENDKEY => '170000346762_20140315'} failed, transitioning
> from OPENING to FAILED_OPEN in ZK, expecting version 1
>
> Here what does "expecting version 1" indicate?
>
>
> 2014-04-17 13:27 GMT+08:00 Tao Xiao <xi...@gmail.com>:
>
> > Take the region
> >
> E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cdfor
> example.
> >
> > I checked the master's log and the region server (*b05.jsepc.com
> > <http://b05.jsepc.com>*) log, and found that in the master log there are
> > just 4 logging lines about that region and the logging time was as early
> as
> > 2014-04-02.
> >
> > In the region server's log, there are more logging lines about that
> > region, but the logging time is quite recent, say 2014-04-16. It seems
> that
> > the master has lost control of that region for a long time, but the
> region
> > server is still managing that region although it cannot open it.
> >
> > The master log is here <http://pastebin.com/6J6v9tSg>, and the region
> > server log is here <http://pastebin.com/fbuu0RpC>.
> >
> >
> > 2014-04-17 9:34 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > You can pick a region which is stuck in transition, find which region
> >> server is hosting it and search region server log on that server.
> >>
> >> By correlating events from master and region server logs, you should see
> >> what is happening.
> >>
> >>
> >> On Wed, Apr 16, 2014 at 6:24 PM, Tao Xiao <xi...@gmail.com>
> >> wrote:
> >>
> >> > Actually, open that link and then click on the picture, it will zoom
> in
> >> and
> >> > become quite clear.
> >> >
> >> > I checked the HMaster UI just now and I am sure that these regions are
> >> > always in transition,  I suppose there would be some exceptions
> >> happening.
> >> > How to prevent regions from being in transition for a long time ?
> >> >
> >> >
> >> > 2014-04-17 9:00 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >> >
> >> > > The picture is not very clear.
> >> > > I don't see E_MP_DAY_READ having regions in transition.
> >> > >
> >> > > Anyway, as long as there is region in transition, balancer would not
> >> run.
> >> > >
> >> > > Cheers
> >> > >
> >> > >
> >> > > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xiaotao.cs.nju@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > Ted,
> >> > > >
> >> > > > I can see some regions of other tables in transition now , but I'm
> >> not
> >> > > sure
> >> > > > how long have them been in transition and I will check the HBase
> >> master
> >> > > UI
> >> > > > later. Here is the
> >> > > > screenshot<
> >> > > >
> >> > >
> >> >
> >>
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> >> > > > >.
> >> > > > From the screenshot, there is a region with state of FAILED_OPEN,
> >> which
> >> > > is
> >> > > > in red, and there are 9 regions in transition for more than 60
> >> seconds.
> >> > > >
> >> > > > Note that the table whose regions all stay in 2 nodes is
> >> E_MP_DAY_READ,
> >> > > > while the other tables shown in the screenshot are named as
> >> > > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322,
> >> E_MP_DAY_READ_20140324,
> >> > > and
> >> > > > so on.
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > >
> >> > > > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >> > > >
> >> > > > > bq. found some regions of other tables in transition, not of
> this
> >> > > table.
> >> > > > >
> >> > > > > That can explain why "balancer" command returned false.
> >> > > > > Are those regions stuck in transition ?
> >> > > > >
> >> > > > > Cheers
> >> > > > >
> >> > > > >
> >> > > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <
> >> xiaotao.cs.nju@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > The command "balance_switch true" returns true, but the
> command
> >> > > > > "balancer"
> >> > > > > > returns false. I checked the HMaster UI and found some regions
> >> of
> >> > > other
> >> > > > > > tables in transition, not of this table.
> >> > > > > >
> >> > > > > > This table's name is E_MP_DAY_READ, I did grep it in the
> master
> >> log
> >> > > and
> >> > > > > > found only the following lines:
> >> > > > > >
> >> > > > > > 2014-04-15 15:50:59,925 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,926 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,926 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,937 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,938 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > > 2014-04-15 15:50:59,940 INFO
> >> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> >> > > > > > handler.ServerShutdownHandler: Skip assigning region
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> >> > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> >> > > > > >
> >> > > > > > so few logging lines about it, looks strange ?
> >> > > > > >
> >> > > > > >
> >> > > > > > BTW, I can spread the regions of this table evenly across the
> >> whole
> >> > > > > cluster
> >> > > > > > after I shutdown the two region servers where the regions of
> >> this
> >> > > table
> >> > > > > > resided originally.
> >> > > > > >
> >> > > > > >
> >> > > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >> > > > > >
> >> > > > > > > Is load balancer enabled ?
> >> > > > > > >
> >> > > > > > > Can you grep this table in master log and pastebin what you
> >> > found ?
> >> > > > > > >
> >> > > > > > > Cheers
> >> > > > > > >
> >> > > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <
> >> xiaotao.cs.nju@gmail.com>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers).
> >> One
> >> > of
> >> > > my
> >> > > > > > HBase
> >> > > > > > > > tables has 50 regions but I found that the 50 regions all
> >> stay
> >> > in
> >> > > > > just
> >> > > > > > > two
> >> > > > > > > > nodes, not spread evenly in the 18 nodes. I did not
> >> pre-create
> >> > > > splits
> >> > > > > > so
> >> > > > > > > > this table was gradually split into 50 regions itself.
> >> > > > > > > >
> >> > > > > > > > I'd like to know why all the regions stay in just two
> nodes,
> >> > not
> >> > > > the
> >> > > > > 18
> >> > > > > > > > nodes of the cluster, and how to spread the regions evenly
> >> > across
> >> > > > all
> >> > > > > > the
> >> > > > > > > > region servers. Thanks.
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Tao Xiao <xi...@gmail.com>.

BTW, the region server reported:

2014-04-16 11:30:31,890 INFO  [RS_OPEN_REGION-b05:60020-0]
handler.OpenRegionHandler: Opening of region {ENCODED =>
6886ac98a71a47dc78a9e0ab5b3f07cd, NAME =>
'E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cd.',
STARTKEY => '', ENDKEY => '170000346762_20140315'} failed, transitioning
from OPENING to FAILED_OPEN in ZK, expecting version 1

Here what does "expecting version 1" indicate?


2014-04-17 13:27 GMT+08:00 Tao Xiao <xi...@gmail.com>:

> Take the region
> E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cdfor example.
>
> I checked the master's log and the region server (*b05.jsepc.com
> <http://b05.jsepc.com>*) log, and found that in the master log there are
> just 4 logging lines about that region and the logging time was as early as
> 2014-04-02.
>
> In the region server's log, there are more logging lines about that
> region, but the logging time is quite recent, say 2014-04-16. It seems that
> the master has lost control of that region for a long time, but the region
> server is still managing that region although it cannot open it.
>
> The master log is here <http://pastebin.com/6J6v9tSg>, and the region
> server log is here <http://pastebin.com/fbuu0RpC>.
>
>
> 2014-04-17 9:34 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> You can pick a region which is stuck in transition, find which region
>> server is hosting it and search region server log on that server.
>>
>> By correlating events from master and region server logs, you should see
>> what is happening.
>>
>>
>> On Wed, Apr 16, 2014 at 6:24 PM, Tao Xiao <xi...@gmail.com>
>> wrote:
>>
>> > Actually, open that link and then click on the picture, it will zoom in
>> and
>> > become quite clear.
>> >
>> > I checked the HMaster UI just now and I am sure that these regions are
>> > always in transition,  I suppose there would be some exceptions
>> happening.
>> > How to prevent regions from being in transition for a long time ?
>> >
>> >
>> > 2014-04-17 9:00 GMT+08:00 Ted Yu <yu...@gmail.com>:
>> >
>> > > The picture is not very clear.
>> > > I don't see E_MP_DAY_READ having regions in transition.
>> > >
>> > > Anyway, as long as there is region in transition, balancer would not
>> run.
>> > >
>> > > Cheers
>> > >
>> > >
>> > > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com>
>> > > wrote:
>> > >
>> > > > Ted,
>> > > >
>> > > > I can see some regions of other tables in transition now , but I'm
>> not
>> > > sure
>> > > > how long have them been in transition and I will check the HBase
>> master
>> > > UI
>> > > > later. Here is the
>> > > > screenshot<
>> > > >
>> > >
>> >
>> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
>> > > > >.
>> > > > From the screenshot, there is a region with state of FAILED_OPEN,
>> which
>> > > is
>> > > > in red, and there are 9 regions in transition for more than 60
>> seconds.
>> > > >
>> > > > Note that the table whose regions all stay in 2 nodes is
>> E_MP_DAY_READ,
>> > > > while the other tables shown in the screenshot are named as
>> > > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322,
>> E_MP_DAY_READ_20140324,
>> > > and
>> > > > so on.
>> > > >
>> > > > Thanks.
>> > > >
>> > > >
>> > > > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
>> > > >
>> > > > > bq. found some regions of other tables in transition, not of this
>> > > table.
>> > > > >
>> > > > > That can explain why "balancer" command returned false.
>> > > > > Are those regions stuck in transition ?
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > >
>> > > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <
>> xiaotao.cs.nju@gmail.com
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > The command "balance_switch true" returns true, but the command
>> > > > > "balancer"
>> > > > > > returns false. I checked the HMaster UI and found some regions
>> of
>> > > other
>> > > > > > tables in transition, not of this table.
>> > > > > >
>> > > > > > This table's name is E_MP_DAY_READ, I did grep it in the master
>> log
>> > > and
>> > > > > > found only the following lines:
>> > > > > >
>> > > > > > 2014-04-15 15:50:59,925 INFO
>> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
>> > > > > > handler.ServerShutdownHandler: Skip assigning region
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
>> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
>> > > > > > 2014-04-15 15:50:59,926 INFO
>> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
>> > > > > > handler.ServerShutdownHandler: Skip assigning region
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
>> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
>> > > > > > 2014-04-15 15:50:59,926 INFO
>> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
>> > > > > > handler.ServerShutdownHandler: Skip assigning region
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
>> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
>> > > > > > 2014-04-15 15:50:59,937 INFO
>> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
>> > > > > > handler.ServerShutdownHandler: Skip assigning region
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
>> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
>> > > > > > 2014-04-15 15:50:59,938 INFO
>> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
>> > > > > > handler.ServerShutdownHandler: Skip assigning region
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
>> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
>> > > > > > 2014-04-15 15:50:59,940 INFO
>> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
>> > > > > > handler.ServerShutdownHandler: Skip assigning region
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
>> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
>> > > > > >
>> > > > > > so few logging lines about it, looks strange ?
>> > > > > >
>> > > > > >
>> > > > > > BTW, I can spread the regions of this table evenly across the
>> whole
>> > > > > cluster
>> > > > > > after I shutdown the two region servers where the regions of
>> this
>> > > table
>> > > > > > resided originally.
>> > > > > >
>> > > > > >
>> > > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
>> > > > > >
>> > > > > > > Is load balancer enabled ?
>> > > > > > >
>> > > > > > > Can you grep this table in master log and pastebin what you
>> > found ?
>> > > > > > >
>> > > > > > > Cheers
>> > > > > > >
>> > > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <
>> xiaotao.cs.nju@gmail.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers).
>> One
>> > of
>> > > my
>> > > > > > HBase
>> > > > > > > > tables has 50 regions but I found that the 50 regions all
>> stay
>> > in
>> > > > > just
>> > > > > > > two
>> > > > > > > > nodes, not spread evenly in the 18 nodes. I did not
>> pre-create
>> > > > splits
>> > > > > > so
>> > > > > > > > this table was gradually split into 50 regions itself.
>> > > > > > > >
>> > > > > > > > I'd like to know why all the regions stay in just two nodes,
>> > not
>> > > > the
>> > > > > 18
>> > > > > > > > nodes of the cluster, and how to spread the regions evenly
>> > across
>> > > > all
>> > > > > > the
>> > > > > > > > region servers. Thanks.
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Ted Yu <yu...@gmail.com>.

Looking at this exception, can you check namenode log to see when / who the
file was removed ?


   1. Caused by: java.io.IOException: java.io.FileNotFoundException: File
   does not exist:
   /apps/hbase/data/data/default/E_MP_DAY_READ_20140315/603cefeda4cfc679c41d1896ceb30518/info/028214f3248f4989b731e268102ff72e



On Wed, Apr 16, 2014 at 10:27 PM, Tao Xiao <xi...@gmail.com> wrote:

> Take the region
> E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cd for
> example.
>
> I checked the master's log and the region server (*b05.jsepc.com
> <http://b05.jsepc.com>*) log, and found that in the master log there are
> just 4 logging lines about that region and the logging time was as early as
> 2014-04-02.
>
> In the region server's log, there are more logging lines about that region,
> but the logging time is quite recent, say 2014-04-16. It seems that the
> master has lost control of that region for a long time, but the region
> server is still managing that region although it cannot open it.
>
> The master log is here <http://pastebin.com/6J6v9tSg>, and the region
> server log is here <http://pastebin.com/fbuu0RpC>.
>
>
> 2014-04-17 9:34 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > You can pick a region which is stuck in transition, find which region
> > server is hosting it and search region server log on that server.
> >
> > By correlating events from master and region server logs, you should see
> > what is happening.
> >
> >
> > On Wed, Apr 16, 2014 at 6:24 PM, Tao Xiao <xi...@gmail.com>
> > wrote:
> >
> > > Actually, open that link and then click on the picture, it will zoom in
> > and
> > > become quite clear.
> > >
> > > I checked the HMaster UI just now and I am sure that these regions are
> > > always in transition,  I suppose there would be some exceptions
> > happening.
> > > How to prevent regions from being in transition for a long time ?
> > >
> > >
> > > 2014-04-17 9:00 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >
> > > > The picture is not very clear.
> > > > I don't see E_MP_DAY_READ having regions in transition.
> > > >
> > > > Anyway, as long as there is region in transition, balancer would not
> > run.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com>
> > > > wrote:
> > > >
> > > > > Ted,
> > > > >
> > > > > I can see some regions of other tables in transition now , but I'm
> > not
> > > > sure
> > > > > how long have them been in transition and I will check the HBase
> > master
> > > > UI
> > > > > later. Here is the
> > > > > screenshot<
> > > > >
> > > >
> > >
> >
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> > > > > >.
> > > > > From the screenshot, there is a region with state of FAILED_OPEN,
> > which
> > > > is
> > > > > in red, and there are 9 regions in transition for more than 60
> > seconds.
> > > > >
> > > > > Note that the table whose regions all stay in 2 nodes is
> > E_MP_DAY_READ,
> > > > > while the other tables shown in the screenshot are named as
> > > > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322,
> > E_MP_DAY_READ_20140324,
> > > > and
> > > > > so on.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > > >
> > > > > > bq. found some regions of other tables in transition, not of this
> > > > table.
> > > > > >
> > > > > > That can explain why "balancer" command returned false.
> > > > > > Are those regions stuck in transition ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <
> > xiaotao.cs.nju@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > The command "balance_switch true" returns true, but the command
> > > > > > "balancer"
> > > > > > > returns false. I checked the HMaster UI and found some regions
> of
> > > > other
> > > > > > > tables in transition, not of this table.
> > > > > > >
> > > > > > > This table's name is E_MP_DAY_READ, I did grep it in the master
> > log
> > > > and
> > > > > > > found only the following lines:
> > > > > > >
> > > > > > > 2014-04-15 15:50:59,925 INFO
> > >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> > > > > > > 2014-04-15 15:50:59,926 INFO
> > >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> > > > > > > 2014-04-15 15:50:59,926 INFO
> > >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> > > > > > > 2014-04-15 15:50:59,937 INFO
> > >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> > > > > > > 2014-04-15 15:50:59,938 INFO
> > >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> > > > > > > 2014-04-15 15:50:59,940 INFO
> > >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > > > > > > because it has been opened in a04.jsepc.com
> ,60020,1397548219084
> > > > > > >
> > > > > > > so few logging lines about it, looks strange ?
> > > > > > >
> > > > > > >
> > > > > > > BTW, I can spread the regions of this table evenly across the
> > whole
> > > > > > cluster
> > > > > > > after I shutdown the two region servers where the regions of
> this
> > > > table
> > > > > > > resided originally.
> > > > > > >
> > > > > > >
> > > > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > > > > >
> > > > > > > > Is load balancer enabled ?
> > > > > > > >
> > > > > > > > Can you grep this table in master log and pastebin what you
> > > found ?
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <
> > xiaotao.cs.nju@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers).
> One
> > > of
> > > > my
> > > > > > > HBase
> > > > > > > > > tables has 50 regions but I found that the 50 regions all
> > stay
> > > in
> > > > > > just
> > > > > > > > two
> > > > > > > > > nodes, not spread evenly in the 18 nodes. I did not
> > pre-create
> > > > > splits
> > > > > > > so
> > > > > > > > > this table was gradually split into 50 regions itself.
> > > > > > > > >
> > > > > > > > > I'd like to know why all the regions stay in just two
> nodes,
> > > not
> > > > > the
> > > > > > 18
> > > > > > > > > nodes of the cluster, and how to spread the regions evenly
> > > across
> > > > > all
> > > > > > > the
> > > > > > > > > region servers. Thanks.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Tao Xiao <xi...@gmail.com>.

Take the region
E_MP_DAY_READ_20140315,,1396363260513.6886ac98a71a47dc78a9e0ab5b3f07cd for
example.

I checked the master's log and the region server (*b05.jsepc.com
<http://b05.jsepc.com>*) log, and found that in the master log there are
just 4 logging lines about that region and the logging time was as early as
2014-04-02.

In the region server's log, there are more logging lines about that region,
but the logging time is quite recent, say 2014-04-16. It seems that the
master has lost control of that region for a long time, but the region
server is still managing that region although it cannot open it.

The master log is here <http://pastebin.com/6J6v9tSg>, and the region
server log is here <http://pastebin.com/fbuu0RpC>.


2014-04-17 9:34 GMT+08:00 Ted Yu <yu...@gmail.com>:

> You can pick a region which is stuck in transition, find which region
> server is hosting it and search region server log on that server.
>
> By correlating events from master and region server logs, you should see
> what is happening.
>
>
> On Wed, Apr 16, 2014 at 6:24 PM, Tao Xiao <xi...@gmail.com>
> wrote:
>
> > Actually, open that link and then click on the picture, it will zoom in
> and
> > become quite clear.
> >
> > I checked the HMaster UI just now and I am sure that these regions are
> > always in transition,  I suppose there would be some exceptions
> happening.
> > How to prevent regions from being in transition for a long time ?
> >
> >
> > 2014-04-17 9:00 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > > The picture is not very clear.
> > > I don't see E_MP_DAY_READ having regions in transition.
> > >
> > > Anyway, as long as there is region in transition, balancer would not
> run.
> > >
> > > Cheers
> > >
> > >
> > > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com>
> > > wrote:
> > >
> > > > Ted,
> > > >
> > > > I can see some regions of other tables in transition now , but I'm
> not
> > > sure
> > > > how long have them been in transition and I will check the HBase
> master
> > > UI
> > > > later. Here is the
> > > > screenshot<
> > > >
> > >
> >
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> > > > >.
> > > > From the screenshot, there is a region with state of FAILED_OPEN,
> which
> > > is
> > > > in red, and there are 9 regions in transition for more than 60
> seconds.
> > > >
> > > > Note that the table whose regions all stay in 2 nodes is
> E_MP_DAY_READ,
> > > > while the other tables shown in the screenshot are named as
> > > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322,
> E_MP_DAY_READ_20140324,
> > > and
> > > > so on.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > >
> > > > > bq. found some regions of other tables in transition, not of this
> > > table.
> > > > >
> > > > > That can explain why "balancer" command returned false.
> > > > > Are those regions stuck in transition ?
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <
> xiaotao.cs.nju@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > The command "balance_switch true" returns true, but the command
> > > > > "balancer"
> > > > > > returns false. I checked the HMaster UI and found some regions of
> > > other
> > > > > > tables in transition, not of this table.
> > > > > >
> > > > > > This table's name is E_MP_DAY_READ, I did grep it in the master
> log
> > > and
> > > > > > found only the following lines:
> > > > > >
> > > > > > 2014-04-15 15:50:59,925 INFO
> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > > 2014-04-15 15:50:59,926 INFO
> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > > 2014-04-15 15:50:59,926 INFO
> >  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > > 2014-04-15 15:50:59,937 INFO
> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > > 2014-04-15 15:50:59,938 INFO
> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > > 2014-04-15 15:50:59,940 INFO
> >  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > >
> > > > > > so few logging lines about it, looks strange ?
> > > > > >
> > > > > >
> > > > > > BTW, I can spread the regions of this table evenly across the
> whole
> > > > > cluster
> > > > > > after I shutdown the two region servers where the regions of this
> > > table
> > > > > > resided originally.
> > > > > >
> > > > > >
> > > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > > > >
> > > > > > > Is load balancer enabled ?
> > > > > > >
> > > > > > > Can you grep this table in master log and pastebin what you
> > found ?
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <
> xiaotao.cs.nju@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers). One
> > of
> > > my
> > > > > > HBase
> > > > > > > > tables has 50 regions but I found that the 50 regions all
> stay
> > in
> > > > > just
> > > > > > > two
> > > > > > > > nodes, not spread evenly in the 18 nodes. I did not
> pre-create
> > > > splits
> > > > > > so
> > > > > > > > this table was gradually split into 50 regions itself.
> > > > > > > >
> > > > > > > > I'd like to know why all the regions stay in just two nodes,
> > not
> > > > the
> > > > > 18
> > > > > > > > nodes of the cluster, and how to spread the regions evenly
> > across
> > > > all
> > > > > > the
> > > > > > > > region servers. Thanks.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Ted Yu <yu...@gmail.com>.

You can pick a region which is stuck in transition, find which region
server is hosting it and search region server log on that server.

By correlating events from master and region server logs, you should see
what is happening.


On Wed, Apr 16, 2014 at 6:24 PM, Tao Xiao <xi...@gmail.com> wrote:

> Actually, open that link and then click on the picture, it will zoom in and
> become quite clear.
>
> I checked the HMaster UI just now and I am sure that these regions are
> always in transition,  I suppose there would be some exceptions happening.
> How to prevent regions from being in transition for a long time ?
>
>
> 2014-04-17 9:00 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > The picture is not very clear.
> > I don't see E_MP_DAY_READ having regions in transition.
> >
> > Anyway, as long as there is region in transition, balancer would not run.
> >
> > Cheers
> >
> >
> > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com>
> > wrote:
> >
> > > Ted,
> > >
> > > I can see some regions of other tables in transition now , but I'm not
> > sure
> > > how long have them been in transition and I will check the HBase master
> > UI
> > > later. Here is the
> > > screenshot<
> > >
> >
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> > > >.
> > > From the screenshot, there is a region with state of FAILED_OPEN, which
> > is
> > > in red, and there are 9 regions in transition for more than 60 seconds.
> > >
> > > Note that the table whose regions all stay in 2 nodes is E_MP_DAY_READ,
> > > while the other tables shown in the screenshot are named as
> > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322, E_MP_DAY_READ_20140324,
> > and
> > > so on.
> > >
> > > Thanks.
> > >
> > >
> > > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >
> > > > bq. found some regions of other tables in transition, not of this
> > table.
> > > >
> > > > That can explain why "balancer" command returned false.
> > > > Are those regions stuck in transition ?
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <xiaotao.cs.nju@gmail.com
> >
> > > > wrote:
> > > >
> > > > > The command "balance_switch true" returns true, but the command
> > > > "balancer"
> > > > > returns false. I checked the HMaster UI and found some regions of
> > other
> > > > > tables in transition, not of this table.
> > > > >
> > > > > This table's name is E_MP_DAY_READ, I did grep it in the master log
> > and
> > > > > found only the following lines:
> > > > >
> > > > > 2014-04-15 15:50:59,925 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,926 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,926 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,937 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,938 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,940 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > >
> > > > > so few logging lines about it, looks strange ?
> > > > >
> > > > >
> > > > > BTW, I can spread the regions of this table evenly across the whole
> > > > cluster
> > > > > after I shutdown the two region servers where the regions of this
> > table
> > > > > resided originally.
> > > > >
> > > > >
> > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > > >
> > > > > > Is load balancer enabled ?
> > > > > >
> > > > > > Can you grep this table in master log and pastebin what you
> found ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers). One
> of
> > my
> > > > > HBase
> > > > > > > tables has 50 regions but I found that the 50 regions all stay
> in
> > > > just
> > > > > > two
> > > > > > > nodes, not spread evenly in the 18 nodes. I did not pre-create
> > > splits
> > > > > so
> > > > > > > this table was gradually split into 50 regions itself.
> > > > > > >
> > > > > > > I'd like to know why all the regions stay in just two nodes,
> not
> > > the
> > > > 18
> > > > > > > nodes of the cluster, and how to spread the regions evenly
> across
> > > all
> > > > > the
> > > > > > > region servers. Thanks.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Oh, I see you point Ted. You're right. There is E_MP_DAY_READ_XXX table's
regions in transition, but not any E_MP_DAY_READ in that screenshot...


2014-04-16 21:24 GMT-04:00 Tao Xiao <xi...@gmail.com>:

> Actually, open that link and then click on the picture, it will zoom in and
> become quite clear.
>
> I checked the HMaster UI just now and I am sure that these regions are
> always in transition,  I suppose there would be some exceptions happening.
> How to prevent regions from being in transition for a long time ?
>
>
> 2014-04-17 9:00 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > The picture is not very clear.
> > I don't see E_MP_DAY_READ having regions in transition.
> >
> > Anyway, as long as there is region in transition, balancer would not run.
> >
> > Cheers
> >
> >
> > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com>
> > wrote:
> >
> > > Ted,
> > >
> > > I can see some regions of other tables in transition now , but I'm not
> > sure
> > > how long have them been in transition and I will check the HBase master
> > UI
> > > later. Here is the
> > > screenshot<
> > >
> >
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> > > >.
> > > From the screenshot, there is a region with state of FAILED_OPEN, which
> > is
> > > in red, and there are 9 regions in transition for more than 60 seconds.
> > >
> > > Note that the table whose regions all stay in 2 nodes is E_MP_DAY_READ,
> > > while the other tables shown in the screenshot are named as
> > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322, E_MP_DAY_READ_20140324,
> > and
> > > so on.
> > >
> > > Thanks.
> > >
> > >
> > > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >
> > > > bq. found some regions of other tables in transition, not of this
> > table.
> > > >
> > > > That can explain why "balancer" command returned false.
> > > > Are those regions stuck in transition ?
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <xiaotao.cs.nju@gmail.com
> >
> > > > wrote:
> > > >
> > > > > The command "balance_switch true" returns true, but the command
> > > > "balancer"
> > > > > returns false. I checked the HMaster UI and found some regions of
> > other
> > > > > tables in transition, not of this table.
> > > > >
> > > > > This table's name is E_MP_DAY_READ, I did grep it in the master log
> > and
> > > > > found only the following lines:
> > > > >
> > > > > 2014-04-15 15:50:59,925 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,926 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,926 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,937 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,938 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,940 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > >
> > > > > so few logging lines about it, looks strange ?
> > > > >
> > > > >
> > > > > BTW, I can spread the regions of this table evenly across the whole
> > > > cluster
> > > > > after I shutdown the two region servers where the regions of this
> > table
> > > > > resided originally.
> > > > >
> > > > >
> > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > > >
> > > > > > Is load balancer enabled ?
> > > > > >
> > > > > > Can you grep this table in master log and pastebin what you
> found ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers). One
> of
> > my
> > > > > HBase
> > > > > > > tables has 50 regions but I found that the 50 regions all stay
> in
> > > > just
> > > > > > two
> > > > > > > nodes, not spread evenly in the 18 nodes. I did not pre-create
> > > splits
> > > > > so
> > > > > > > this table was gradually split into 50 regions itself.
> > > > > > >
> > > > > > > I'd like to know why all the regions stay in just two nodes,
> not
> > > the
> > > > 18
> > > > > > > nodes of the cluster, and how to spread the regions evenly
> across
> > > all
> > > > > the
> > > > > > > region servers. Thanks.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Tao Xiao <xi...@gmail.com>.

Actually, open that link and then click on the picture, it will zoom in and
become quite clear.

I checked the HMaster UI just now and I am sure that these regions are
always in transition,  I suppose there would be some exceptions happening.
How to prevent regions from being in transition for a long time ?


2014-04-17 9:00 GMT+08:00 Ted Yu <yu...@gmail.com>:

> The picture is not very clear.
> I don't see E_MP_DAY_READ having regions in transition.
>
> Anyway, as long as there is region in transition, balancer would not run.
>
> Cheers
>
>
> On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com>
> wrote:
>
> > Ted,
> >
> > I can see some regions of other tables in transition now , but I'm not
> sure
> > how long have them been in transition and I will check the HBase master
> UI
> > later. Here is the
> > screenshot<
> >
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> > >.
> > From the screenshot, there is a region with state of FAILED_OPEN, which
> is
> > in red, and there are 9 regions in transition for more than 60 seconds.
> >
> > Note that the table whose regions all stay in 2 nodes is E_MP_DAY_READ,
> > while the other tables shown in the screenshot are named as
> > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322, E_MP_DAY_READ_20140324,
> and
> > so on.
> >
> > Thanks.
> >
> >
> > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > > bq. found some regions of other tables in transition, not of this
> table.
> > >
> > > That can explain why "balancer" command returned false.
> > > Are those regions stuck in transition ?
> > >
> > > Cheers
> > >
> > >
> > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <xi...@gmail.com>
> > > wrote:
> > >
> > > > The command "balance_switch true" returns true, but the command
> > > "balancer"
> > > > returns false. I checked the HMaster UI and found some regions of
> other
> > > > tables in transition, not of this table.
> > > >
> > > > This table's name is E_MP_DAY_READ, I did grep it in the master log
> and
> > > > found only the following lines:
> > > >
> > > > 2014-04-15 15:50:59,925 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,937 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,938 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,940 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > >
> > > > so few logging lines about it, looks strange ?
> > > >
> > > >
> > > > BTW, I can spread the regions of this table evenly across the whole
> > > cluster
> > > > after I shutdown the two region servers where the regions of this
> table
> > > > resided originally.
> > > >
> > > >
> > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > >
> > > > > Is load balancer enabled ?
> > > > >
> > > > > Can you grep this table in master log and pastebin what you found ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers). One of
> my
> > > > HBase
> > > > > > tables has 50 regions but I found that the 50 regions all stay in
> > > just
> > > > > two
> > > > > > nodes, not spread evenly in the 18 nodes. I did not pre-create
> > splits
> > > > so
> > > > > > this table was gradually split into 50 regions itself.
> > > > > >
> > > > > > I'd like to know why all the regions stay in just two nodes, not
> > the
> > > 18
> > > > > > nodes of the cluster, and how to spread the regions evenly across
> > all
> > > > the
> > > > > > region servers. Thanks.
> > > > >
> > > >
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Ted Yu <yu...@gmail.com>.

The tables whose regions were in transition are named E_MP_DAY_READ_XXX

No region from E_MP_DAY_READ was in transition - according to the picture.


On Wed, Apr 16, 2014 at 6:16 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Click on the picture twice...
>
> Yes E_MP_DAY_READ has a lot of regions in transition.
>
> JM
>
>
> 2014-04-16 21:00 GMT-04:00 Ted Yu <yu...@gmail.com>:
>
> > The picture is not very clear.
> > I don't see E_MP_DAY_READ having regions in transition.
> >
> > Anyway, as long as there is region in transition, balancer would not run.
> >
> > Cheers
> >
> >
> > On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com>
> > wrote:
> >
> > > Ted,
> > >
> > > I can see some regions of other tables in transition now , but I'm not
> > sure
> > > how long have them been in transition and I will check the HBase master
> > UI
> > > later. Here is the
> > > screenshot<
> > >
> >
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> > > >.
> > > From the screenshot, there is a region with state of FAILED_OPEN, which
> > is
> > > in red, and there are 9 regions in transition for more than 60 seconds.
> > >
> > > Note that the table whose regions all stay in 2 nodes is E_MP_DAY_READ,
> > > while the other tables shown in the screenshot are named as
> > > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322, E_MP_DAY_READ_20140324,
> > and
> > > so on.
> > >
> > > Thanks.
> > >
> > >
> > > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >
> > > > bq. found some regions of other tables in transition, not of this
> > table.
> > > >
> > > > That can explain why "balancer" command returned false.
> > > > Are those regions stuck in transition ?
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <xiaotao.cs.nju@gmail.com
> >
> > > > wrote:
> > > >
> > > > > The command "balance_switch true" returns true, but the command
> > > > "balancer"
> > > > > returns false. I checked the HMaster UI and found some regions of
> > other
> > > > > tables in transition, not of this table.
> > > > >
> > > > > This table's name is E_MP_DAY_READ, I did grep it in the master log
> > and
> > > > > found only the following lines:
> > > > >
> > > > > 2014-04-15 15:50:59,925 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,926 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,926 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,937 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,938 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > > 2014-04-15 15:50:59,940 INFO
>  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > > handler.ServerShutdownHandler: Skip assigning region
> > > > >
> > > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > >
> > > > > so few logging lines about it, looks strange ?
> > > > >
> > > > >
> > > > > BTW, I can spread the regions of this table evenly across the whole
> > > > cluster
> > > > > after I shutdown the two region servers where the regions of this
> > table
> > > > > resided originally.
> > > > >
> > > > >
> > > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > > >
> > > > > > Is load balancer enabled ?
> > > > > >
> > > > > > Can you grep this table in master log and pastebin what you
> found ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers). One
> of
> > my
> > > > > HBase
> > > > > > > tables has 50 regions but I found that the 50 regions all stay
> in
> > > > just
> > > > > > two
> > > > > > > nodes, not spread evenly in the 18 nodes. I did not pre-create
> > > splits
> > > > > so
> > > > > > > this table was gradually split into 50 regions itself.
> > > > > > >
> > > > > > > I'd like to know why all the regions stay in just two nodes,
> not
> > > the
> > > > 18
> > > > > > > nodes of the cluster, and how to spread the regions evenly
> across
> > > all
> > > > > the
> > > > > > > region servers. Thanks.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Click on the picture twice...

Yes E_MP_DAY_READ has a lot of regions in transition.

JM


2014-04-16 21:00 GMT-04:00 Ted Yu <yu...@gmail.com>:

> The picture is not very clear.
> I don't see E_MP_DAY_READ having regions in transition.
>
> Anyway, as long as there is region in transition, balancer would not run.
>
> Cheers
>
>
> On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com>
> wrote:
>
> > Ted,
> >
> > I can see some regions of other tables in transition now , but I'm not
> sure
> > how long have them been in transition and I will check the HBase master
> UI
> > later. Here is the
> > screenshot<
> >
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> > >.
> > From the screenshot, there is a region with state of FAILED_OPEN, which
> is
> > in red, and there are 9 regions in transition for more than 60 seconds.
> >
> > Note that the table whose regions all stay in 2 nodes is E_MP_DAY_READ,
> > while the other tables shown in the screenshot are named as
> > E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322, E_MP_DAY_READ_20140324,
> and
> > so on.
> >
> > Thanks.
> >
> >
> > 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > > bq. found some regions of other tables in transition, not of this
> table.
> > >
> > > That can explain why "balancer" command returned false.
> > > Are those regions stuck in transition ?
> > >
> > > Cheers
> > >
> > >
> > > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <xi...@gmail.com>
> > > wrote:
> > >
> > > > The command "balance_switch true" returns true, but the command
> > > "balancer"
> > > > returns false. I checked the HMaster UI and found some regions of
> other
> > > > tables in transition, not of this table.
> > > >
> > > > This table's name is E_MP_DAY_READ, I did grep it in the master log
> and
> > > > found only the following lines:
> > > >
> > > > 2014-04-15 15:50:59,925 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,937 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,938 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > > 2014-04-15 15:50:59,940 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > > handler.ServerShutdownHandler: Skip assigning region
> > > >
> > > >
> > >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > >
> > > > so few logging lines about it, looks strange ?
> > > >
> > > >
> > > > BTW, I can spread the regions of this table evenly across the whole
> > > cluster
> > > > after I shutdown the two region servers where the regions of this
> table
> > > > resided originally.
> > > >
> > > >
> > > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > >
> > > > > Is load balancer enabled ?
> > > > >
> > > > > Can you grep this table in master log and pastebin what you found ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I am using HDP 2.0.6, which has 18 nodes(region servers). One of
> my
> > > > HBase
> > > > > > tables has 50 regions but I found that the 50 regions all stay in
> > > just
> > > > > two
> > > > > > nodes, not spread evenly in the 18 nodes. I did not pre-create
> > splits
> > > > so
> > > > > > this table was gradually split into 50 regions itself.
> > > > > >
> > > > > > I'd like to know why all the regions stay in just two nodes, not
> > the
> > > 18
> > > > > > nodes of the cluster, and how to spread the regions evenly across
> > all
> > > > the
> > > > > > region servers. Thanks.
> > > > >
> > > >
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Ted Yu <yu...@gmail.com>.

The picture is not very clear.
I don't see E_MP_DAY_READ having regions in transition.

Anyway, as long as there is region in transition, balancer would not run.

Cheers


On Wed, Apr 16, 2014 at 5:52 PM, Tao Xiao <xi...@gmail.com> wrote:

> Ted,
>
> I can see some regions of other tables in transition now , but I'm not sure
> how long have them been in transition and I will check the HBase master UI
> later. Here is the
> screenshot<
> http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png
> >.
> From the screenshot, there is a region with state of FAILED_OPEN, which is
> in red, and there are 9 regions in transition for more than 60 seconds.
>
> Note that the table whose regions all stay in 2 nodes is E_MP_DAY_READ,
> while the other tables shown in the screenshot are named as
> E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322, E_MP_DAY_READ_20140324, and
> so on.
>
> Thanks.
>
>
> 2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > bq. found some regions of other tables in transition, not of this table.
> >
> > That can explain why "balancer" command returned false.
> > Are those regions stuck in transition ?
> >
> > Cheers
> >
> >
> > On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <xi...@gmail.com>
> > wrote:
> >
> > > The command "balance_switch true" returns true, but the command
> > "balancer"
> > > returns false. I checked the HMaster UI and found some regions of other
> > > tables in transition, not of this table.
> > >
> > > This table's name is E_MP_DAY_READ, I did grep it in the master log and
> > > found only the following lines:
> > >
> > > 2014-04-15 15:50:59,925 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > handler.ServerShutdownHandler: Skip assigning region
> > >
> > >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > handler.ServerShutdownHandler: Skip assigning region
> > >
> > >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > > handler.ServerShutdownHandler: Skip assigning region
> > >
> > >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > 2014-04-15 15:50:59,937 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > handler.ServerShutdownHandler: Skip assigning region
> > >
> > >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > 2014-04-15 15:50:59,938 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > handler.ServerShutdownHandler: Skip assigning region
> > >
> > >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > > 2014-04-15 15:50:59,940 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > > handler.ServerShutdownHandler: Skip assigning region
> > >
> > >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > > because it has been opened in a04.jsepc.com,60020,1397548219084
> > >
> > > so few logging lines about it, looks strange ?
> > >
> > >
> > > BTW, I can spread the regions of this table evenly across the whole
> > cluster
> > > after I shutdown the two region servers where the regions of this table
> > > resided originally.
> > >
> > >
> > > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >
> > > > Is load balancer enabled ?
> > > >
> > > > Can you grep this table in master log and pastebin what you found ?
> > > >
> > > > Cheers
> > > >
> > > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com>
> > wrote:
> > > >
> > > > > I am using HDP 2.0.6, which has 18 nodes(region servers). One of my
> > > HBase
> > > > > tables has 50 regions but I found that the 50 regions all stay in
> > just
> > > > two
> > > > > nodes, not spread evenly in the 18 nodes. I did not pre-create
> splits
> > > so
> > > > > this table was gradually split into 50 regions itself.
> > > > >
> > > > > I'd like to know why all the regions stay in just two nodes, not
> the
> > 18
> > > > > nodes of the cluster, and how to spread the regions evenly across
> all
> > > the
> > > > > region servers. Thanks.
> > > >
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Tao Xiao <xi...@gmail.com>.

Ted,

I can see some regions of other tables in transition now , but I'm not sure
how long have them been in transition and I will check the HBase master UI
later. Here is the
screenshot<http://picpaste.com/Regions_in_Transition_-_2014-04-17_08-38-qyf5anz8.png>.
>From the screenshot, there is a region with state of FAILED_OPEN, which is
in red, and there are 9 regions in transition for more than 60 seconds.

Note that the table whose regions all stay in 2 nodes is E_MP_DAY_READ,
while the other tables shown in the screenshot are named as
E_MP_DAY_READ_20140315, E_MP_DAY_READ_20140322, E_MP_DAY_READ_20140324, and
so on.

Thanks.


2014-04-16 23:10 GMT+08:00 Ted Yu <yu...@gmail.com>:

> bq. found some regions of other tables in transition, not of this table.
>
> That can explain why "balancer" command returned false.
> Are those regions stuck in transition ?
>
> Cheers
>
>
> On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <xi...@gmail.com>
> wrote:
>
> > The command "balance_switch true" returns true, but the command
> "balancer"
> > returns false. I checked the HMaster UI and found some regions of other
> > tables in transition, not of this table.
> >
> > This table's name is E_MP_DAY_READ, I did grep it in the master log and
> > found only the following lines:
> >
> > 2014-04-15 15:50:59,925 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > handler.ServerShutdownHandler: Skip assigning region
> >
> >
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> > because it has been opened in a04.jsepc.com,60020,1397548219084
> > 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > handler.ServerShutdownHandler: Skip assigning region
> >
> >
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> > because it has been opened in a04.jsepc.com,60020,1397548219084
> > 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> > handler.ServerShutdownHandler: Skip assigning region
> >
> >
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> > because it has been opened in a04.jsepc.com,60020,1397548219084
> > 2014-04-15 15:50:59,937 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > handler.ServerShutdownHandler: Skip assigning region
> >
> >
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> > because it has been opened in a04.jsepc.com,60020,1397548219084
> > 2014-04-15 15:50:59,938 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > handler.ServerShutdownHandler: Skip assigning region
> >
> >
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> > because it has been opened in a04.jsepc.com,60020,1397548219084
> > 2014-04-15 15:50:59,940 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> > handler.ServerShutdownHandler: Skip assigning region
> >
> >
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> > because it has been opened in a04.jsepc.com,60020,1397548219084
> >
> > so few logging lines about it, looks strange ?
> >
> >
> > BTW, I can spread the regions of this table evenly across the whole
> cluster
> > after I shutdown the two region servers where the regions of this table
> > resided originally.
> >
> >
> > 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > > Is load balancer enabled ?
> > >
> > > Can you grep this table in master log and pastebin what you found ?
> > >
> > > Cheers
> > >
> > > On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com>
> wrote:
> > >
> > > > I am using HDP 2.0.6, which has 18 nodes(region servers). One of my
> > HBase
> > > > tables has 50 regions but I found that the 50 regions all stay in
> just
> > > two
> > > > nodes, not spread evenly in the 18 nodes. I did not pre-create splits
> > so
> > > > this table was gradually split into 50 regions itself.
> > > >
> > > > I'd like to know why all the regions stay in just two nodes, not the
> 18
> > > > nodes of the cluster, and how to spread the regions evenly across all
> > the
> > > > region servers. Thanks.
> > >
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Ted Yu <yu...@gmail.com>.

bq. found some regions of other tables in transition, not of this table.

That can explain why "balancer" command returned false.
Are those regions stuck in transition ?

Cheers


On Tue, Apr 15, 2014 at 10:47 PM, Tao Xiao <xi...@gmail.com> wrote:

> The command "balance_switch true" returns true, but the command "balancer"
> returns false. I checked the HMaster UI and found some regions of other
> tables in transition, not of this table.
>
> This table's name is E_MP_DAY_READ, I did grep it in the master log and
> found only the following lines:
>
> 2014-04-15 15:50:59,925 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> handler.ServerShutdownHandler: Skip assigning region
>
> E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
> because it has been opened in a04.jsepc.com,60020,1397548219084
> 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> handler.ServerShutdownHandler: Skip assigning region
>
> E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
> because it has been opened in a04.jsepc.com,60020,1397548219084
> 2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
> handler.ServerShutdownHandler: Skip assigning region
>
> E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
> because it has been opened in a04.jsepc.com,60020,1397548219084
> 2014-04-15 15:50:59,937 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> handler.ServerShutdownHandler: Skip assigning region
>
> E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
> because it has been opened in a04.jsepc.com,60020,1397548219084
> 2014-04-15 15:50:59,938 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> handler.ServerShutdownHandler: Skip assigning region
>
> E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
> because it has been opened in a04.jsepc.com,60020,1397548219084
> 2014-04-15 15:50:59,940 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
> handler.ServerShutdownHandler: Skip assigning region
>
> E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
> because it has been opened in a04.jsepc.com,60020,1397548219084
>
> so few logging lines about it, looks strange ?
>
>
> BTW, I can spread the regions of this table evenly across the whole cluster
> after I shutdown the two region servers where the regions of this table
> resided originally.
>
>
> 2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > Is load balancer enabled ?
> >
> > Can you grep this table in master log and pastebin what you found ?
> >
> > Cheers
> >
> > On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com> wrote:
> >
> > > I am using HDP 2.0.6, which has 18 nodes(region servers). One of my
> HBase
> > > tables has 50 regions but I found that the 50 regions all stay in just
> > two
> > > nodes, not spread evenly in the 18 nodes. I did not pre-create splits
> so
> > > this table was gradually split into 50 regions itself.
> > >
> > > I'd like to know why all the regions stay in just two nodes, not the 18
> > > nodes of the cluster, and how to spread the regions evenly across all
> the
> > > region servers. Thanks.
> >
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Tao Xiao <xi...@gmail.com>.

The command "balance_switch true" returns true, but the command "balancer"
returns false. I checked the HMaster UI and found some regions of other
tables in transition, not of this table.

This table's name is E_MP_DAY_READ, I did grep it in the master log and
found only the following lines:

2014-04-15 15:50:59,925 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,160001123745_2014-01-25:00:00:00,1395753408476.ba5c8291f8dad37d5b9621b7334c17b0.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,300007915618_2014-03-13:00:00:00,1395994146202.ec4e397baffd1cc40bdc18ce0ab2f28a.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,926 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-1]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,300013608840_2014-02-21:00:00:00,1395749573711.744bab52befec279a7ee97497801e10f.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,937 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,300000497780_2014-01-23:00:00:00,1395746363941.79b831e698053b1005f7a97c9f2a6ddc.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,938 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,300008188567_2014-03-04:00:00:00,1395756104426.eb1806c2dc5833152b6b5e7b5e4a88b8.
because it has been opened in a04.jsepc.com,60020,1397548219084
2014-04-15 15:50:59,940 INFO  [MASTER_SERVER_OPERATIONS-b03:60000-2]
handler.ServerShutdownHandler: Skip assigning region
E_MP_DAY_READ,300016987143_2014-01-21:00:00:00,1395986789897.e4d143865d354bdc2a427c1f00df6ad7.
because it has been opened in a04.jsepc.com,60020,1397548219084

so few logging lines about it, looks strange ?

BTW, I can spread the regions of this table evenly across the whole cluster
after I shutdown the two region servers where the regions of this table
resided originally.

2014-04-15 19:47 GMT+08:00 Ted Yu <yu...@gmail.com>:

> Is load balancer enabled ?
>
> Can you grep this table in master log and pastebin what you found ?
>
> Cheers
>
> On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com> wrote:
>
> > I am using HDP 2.0.6, which has 18 nodes(region servers). One of my HBase
> > tables has 50 regions but I found that the 50 regions all stay in just
> two
> > nodes, not spread evenly in the 18 nodes. I did not pre-create splits so
> > this table was gradually split into 50 regions itself.
> >
> > I'd like to know why all the regions stay in just two nodes, not the 18
> > nodes of the cluster, and how to spread the regions evenly across all the
> > region servers. Thanks.
>

Re: All regions stay on two nodes out of 18 nodes

Posted by Ted Yu <yu...@gmail.com>.

Is load balancer enabled ?

Can you grep this table in master log and pastebin what you found ?

Cheers

On Apr 15, 2014, at 4:40 AM, Tao Xiao <xi...@gmail.com> wrote:

> I am using HDP 2.0.6, which has 18 nodes(region servers). One of my HBase
> tables has 50 regions but I found that the 50 regions all stay in just two
> nodes, not spread evenly in the 18 nodes. I did not pre-create splits so
> this table was gradually split into 50 regions itself.
> 
> I'd like to know why all the regions stay in just two nodes, not the 18
> nodes of the cluster, and how to spread the regions evenly across all the
> region servers. Thanks.

Re: All regions stay on two nodes out of 18 nodes

Posted by divye sheth <di...@gmail.com>.

Check if hbase balancer is on.
$hbase_shell> balance_switch true

Run the balancer from the hbase shell

$hbase_shell> balancer

If the above command returns false check for any regions in transition on
the HMaster UI or check HMaster logs.

Thanks
Divye Sheth

On Tue, Apr 15, 2014 at 5:10 PM, Tao Xiao <xi...@gmail.com> wrote:

> I am using HDP 2.0.6, which has 18 nodes(region servers). One of my HBase
> tables has 50 regions but I found that the 50 regions all stay in just two
> nodes, not spread evenly in the 18 nodes. I did not pre-create splits so
> this table was gradually split into 50 regions itself.
>
> I'd like to know why all the regions stay in just two nodes, not the 18
> nodes of the cluster, and how to spread the regions evenly across all the
> region servers. Thanks.
>