You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Julian Zhou <ju...@me.com> on 2013/08/02 16:20:30 UTC

Long waiting loop for " Waiting for region servers count to settle" when doing hmaster failover

Hi Commnunity,

When I do a testing, I met this issue on 0.94.3.

There are 1 active hmaster, 1 backup hmaster, 4 region servers.
I run YCSB workload on it to load data. During the running of workload,
I manually kill -9 the active hmaster, seems that backup master took
over the active role quickly, but looping on

"
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for xxx ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for xxx ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
...
...
...
<for about 5 - 7 mins looping on this log message>
...

INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 1, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.

INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 2, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 3, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 4, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.

"
It seems there always a looping of 5 - 7 mins for the above waiting
message for region servers to checked in to the new active master. Then
after a long wait loop, it suddenly checked in 4 region servers
successfully.

Any idea of this waiting loop? Thanks a lot for the advice~


-- Best Regards, Julian

Re: Long waiting loop for " Waiting for region servers count to settle" when doing hmaster failover

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Julian,

0.94.3 is a pretty old version There was thousands of fixed done since this
version and the last 0.94.10.

Will you be able to upgrade to a more recent version and retest?

JM

2013/8/2 Julian Zhou <ju...@me.com>

> Hi Commnunity,
>
> When I do a testing, I met this issue on 0.94.3.
>
> There are 1 active hmaster, 1 backup hmaster, 4 region servers.
> I run YCSB workload on it to load data. During the running of workload,
> I manually kill -9 the active hmaster, seems that backup master took
> over the active role quickly, but looping on
>
> "
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 0, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 0, slept for xxx ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 0, slept for xxx ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> ...
> ...
> ...
> <for about 5 - 7 mins looping on this log message>
> ...
>
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 1, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
>
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 2, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 3, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 4, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
>
> "
> It seems there always a looping of 5 - 7 mins for the above waiting
> message for region servers to checked in to the new active master. Then
> after a long wait loop, it suddenly checked in 4 region servers
> successfully.
>
> Any idea of this waiting loop? Thanks a lot for the advice~
>
>
> -- Best Regards, Julian
>