You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Pankaj Kumar (JIRA)" <ji...@apache.org> on 2016/10/11 03:20:20 UTC

[jira] [Created] (HBASE-16805) HMaster may send reportForDuty himself while shutting down

Pankaj Kumar created HBASE-16805:
------------------------------------

             Summary: HMaster may send reportForDuty himself while shutting down
                 Key: HBASE-16805
                 URL: https://issues.apache.org/jira/browse/HBASE-16805
             Project: HBase
          Issue Type: Bug
          Components: master
            Reporter: Pankaj Kumar
            Assignee: Pankaj Kumar
            Priority: Minor


We met an interesting scenario where HMaster had sent reportForDuty to himself during shutting down. 

Initially HMaster had registered himself as active master, but couldn't finish its initialization as Namespace table was not assigned due to some reason within the specified time,
{noformat}
2016-07-30 19:36:52,161 | FATAL | hadoopc1h2:21300.activeMasterManager | Failed to become active master | org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1610)
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
	at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver, org.apache.hadoop.hbase.JMXListener] | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1981)
2016-07-30 19:36:52,162 | FATAL | hadoopc1h2:21300.activeMasterManager | Unhandled exception. Starting shutdown. | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1984)
java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned
	at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:102)
	at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:977)
	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:763)
	at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:171)
	at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1606)
	at java.lang.Thread.run(Thread.java:745)
2016-07-30 19:36:52,187 | INFO  | master/hadoopc1h2/172.16.19.51:21300 | reportForDuty to master=hadoopc1h2,21300,1469877905979 with port=21300, startcode=1469877905979 | org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2271)
2016-07-30 19:36:52,198 | INFO  | hadoopc1h2:21300.activeMasterManager | ConnectorServer stopped! | org.apache.hadoop.hbase.JMXListener.stopConnectorServer(JMXListener.java:160)
{noformat}
Above in the second last line, HMaster sent reportForDuty to himself.


Background:
1) During master startup HMasterCommandLine constructs the HMaster which starts another thread which is waiting to become active,
{code}
	startActiveMasterManager(infoPort);
{code}
 
2) Same time after constructing HMaster, HMasterCommandLine started the HMaster thread, 
{code}
	 HMaster master = HMaster.constructMaster(masterClass, conf, csm);
        if (master.isStopped()) {
          LOG.info("Won't bring the Master up as a shutdown is requested");
          return 1;
        }
        master.start();
        master.join();
{code}
which will be waiting at below code flow,
{noformat}
	HRegionServer
		run()
		   preRegistrationInitialization()
		      initializeZooKeeper()
			waitForMasterActive()
{noformat}

3) In HMaster,
{code}
  protected void waitForMasterActive(){
    boolean tablesOnMaster = BaseLoadBalancer.tablesOnMaster(conf);
    while (!(tablesOnMaster && isActiveMaster)
        && !isStopped() && !isAborted()) {
      sleeper.sleep();
    }
  }
{code}
HMaster will wait here until it is stopped/aborted as "hbase.balancer.tablesOnMaster" is not configured.


When HMaster failed to complete its initialization (as Namespace table was not assigned) then it will be abort,
{noformat}
	abort("Unhandled exception. Starting shutdown.", t);
{noformat}

So step-2 thread will not wait anymore on HMaster abort and while processing further it will send send report to active master.
{code}
      // Try and register with the Master; tell it we are here.  Break if
      // server is stopped or the clusterup flag is down or hdfs went wacky.
      while (keepLooping()) {
        RegionServerStartupResponse w = reportForDuty();
        if (w == null) {
          LOG.warn("reportForDuty failed; sleeping and then retrying.");
          this.sleeper.sleep();
        } else {
          handleReportForDutyResponse(w);
          break;
        }
      }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)