You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Julian Zhou <ju...@me.com> on 2013/08/02 16:20:30 UTC
Long waiting loop for " Waiting for region servers count to settle"
when doing hmaster failover
Hi Commnunity,
When I do a testing, I met this issue on 0.94.3.
There are 1 active hmaster, 1 backup hmaster, 4 region servers.
I run YCSB workload on it to load data. During the running of workload,
I manually kill -9 the active hmaster, seems that backup master took
over the active role quickly, but looping on
"
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for xxx ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 0, slept for xxx ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
...
...
...
<for about 5 - 7 mins looping on this log message>
...
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 1, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 2, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 3, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
servers count to settle; currently checked in 4, slept for 0 ms,
expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
interval of 1500 ms.
"
It seems there always a looping of 5 - 7 mins for the above waiting
message for region servers to checked in to the new active master. Then
after a long wait loop, it suddenly checked in 4 region servers
successfully.
Any idea of this waiting loop? Thanks a lot for the advice~
-- Best Regards, Julian
Re: Long waiting loop for " Waiting for region servers count to
settle" when doing hmaster failover
Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Julian,
0.94.3 is a pretty old version There was thousands of fixed done since this
version and the last 0.94.10.
Will you be able to upgrade to a more recent version and retest?
JM
2013/8/2 Julian Zhou <ju...@me.com>
> Hi Commnunity,
>
> When I do a testing, I met this issue on 0.94.3.
>
> There are 1 active hmaster, 1 backup hmaster, 4 region servers.
> I run YCSB workload on it to load data. During the running of workload,
> I manually kill -9 the active hmaster, seems that backup master took
> over the active role quickly, but looping on
>
> "
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 0, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 0, slept for xxx ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 0, slept for xxx ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> ...
> ...
> ...
> <for about 5 - 7 mins looping on this log message>
> ...
>
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 1, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
>
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 2, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 3, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
> INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region
> servers count to settle; currently checked in 4, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms.
>
> "
> It seems there always a looping of 5 - 7 mins for the above waiting
> message for region servers to checked in to the new active master. Then
> after a long wait loop, it suddenly checked in 4 region servers
> successfully.
>
> Any idea of this waiting loop? Thanks a lot for the advice~
>
>
> -- Best Regards, Julian
>