You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (Assigned) (JIRA)" <ji...@apache.org> on 2012/03/27 20:57:31 UTC
[jira] [Assigned] (HBASE-5639) The logic used in waiting for region
servers during startup is broken
[ https://issues.apache.org/jira/browse/HBASE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Daniel Cryans reassigned HBASE-5639:
-----------------------------------------
Assignee: Jean-Daniel Cryans (was: nkeywal)
Here's what I see now with the patch:
{noformat}
2012-03-27 18:53:07,644 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:08,638 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r29s44,62023,1332874388301
2012-03-27 18:53:08,638 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r27s44,62023,1332874388324
2012-03-27 18:53:08,649 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 2, slept for 1005 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:08,656 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r5s38,62023,1332874388319
2012-03-27 18:53:08,657 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r6s38,62023,1332874388364
2012-03-27 18:53:08,662 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r8s38,62023,1332874388371
2012-03-27 18:53:08,699 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 5, slept for 1055 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:08,897 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r31s44,62023,1332874388453
2012-03-27 18:53:08,900 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 6, slept for 1256 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:09,602 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sv4r30s44,62023,1332874388969
2012-03-27 18:53:09,603 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 7, slept for 1959 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2012-03-27 18:53:11,110 INFO org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 7, slept for 3466 ms, expecting minimum of 1, maximum of 2147483647, master is running.
{noformat}
It confirms it did the right thing, go wild Lars :)
> The logic used in waiting for region servers during startup is broken
> ---------------------------------------------------------------------
>
> Key: HBASE-5639
> URL: https://issues.apache.org/jira/browse/HBASE-5639
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.94.0
>
> Attachments: HBASE-5639.patch
>
>
> See the tail of HBASE-4993, which I'll report here:
> Me:
> {quote}
> I think a bug was introduced here. Here's the new waiting logic in waitForRegionServers:
> the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
> there have been no new region server in for
> 'hbase.master.wait.on.regionservers.interval' time
> And the code that verifies that:
> !(lastCountChange+interval > now && count >= minToStart)
> {quote}
> Nic:
> {quote}
> It seems that changing the code to
> (count < minToStart ||
> lastCountChange+interval > now)
> would make the code works as documented.
> If you have 0 region servers that checked in and you are under the interval, you wait: (true or true) = true.
> If you have 0 region servers but you are above the interval, you wait: (true or false) = true.
> If you have 1 or more region servers that checked in and you are under the interval, you wait: (false or true) = true.
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira