You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2010/08/18 09:16:16 UTC

svn commit: r986584 - in /hbase/branches/0.90_master_rewrite: ./ src/main/java/org/apache/hadoop/hbase/ src/main/java/org/apache/hadoop/hbase/master/ src/main/java/org/apache/hadoop/hbase/master/handler/ src/main/java/org/apache/hadoop/hbase/regionserv...

Author: stack
Date: Wed Aug 18 07:16:15 2010
New Revision: 986584

URL: http://svn.apache.org/viewvc?rev=986584&view=rev
Log:

Main change is fixup on ClusterStatusTracker and making master and regionserver
use it during startup and shutdown.  Cleaned up the master constructor and
run methods... so servers are created and started in same location.  Did same
over in regionserver.

Changed my mind and removed executorservice from Server interface; its only
really used master-side.

M BRANCH_TODO.txt
  Added some notes on whats been done.
M src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java
  Renamed of HRS#init as HRS#handleReportForDutyResponse.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
  Changed the node we track to be /hbase/shutdown rather than root location.
  Added a start that registers this as a listener.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
  Tried to improve the log message.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Removed duplicate create load code.  Made more small methods rather than
  have long constructor and then initialize... same for run method.
M src/main/java/org/apache/hadoop/hbase/Server.java
  Removed getExecutorService.
M src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Pass getExecutorService.
  Fixed up some logging
M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
  Cleaned up constructor creating and starting services in same location.
  Moved all the special stuff we do when this master is starting the
  cluster into a single method.
M src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
  Fixed up log message (was confusing why we were deleting a good OPEN).

Modified:
    hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Server.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
    hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java

Modified: hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt (original)
+++ hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt Wed Aug 18 07:16:15 2010
@@ -9,6 +9,9 @@ harder stuff
   if there are operations that should just retry indefinitely, they need to
   resubmit themselves to their executor service.
 
+  -- Should never timeout IMO and we changed executors so root and meta are
+  done separately so this should be ok? -- St.Ack 20100815
+
 * move splits to RS side, integrate new patch from stack on trunk
   might need a new CREATED unassigned now, or new rpc, but get rid of sending
   split notification on heartbeat?
@@ -18,15 +21,23 @@ harder stuff
   we should use cluster flag in zk to signal RS to
   start rather than master availability and between master up and cluster up
   the master can do bulk of it's initialization.
+  
+  -- Yes.  CST is currently a little off in that its homed on root location
+  rather than the up/down status.  Also shutdown.  RS now watches /hbase/shutdown
+  and starts shutdown when this goes down -- St.Ack 20100815
 
 * figure what to do with client table admin ops (flush, split, compact)
   (direct to RS rpc calls are in place, need to update client)
+  
+  -- And then remove this stuff from HMsg -- St.Ack 20100815
 
 * on region open (and wherever split children notify master) should check if
   if the table is disabled and should close the regions... maybe.
 
 * in RootEditor there is a race condition between delete and watch?
 
+  -- Didn't you say that this was a pigment of your emancipation? -- St.Ack 20100815
+
 * review FileSystemManager calls
 
 * there are some races with master wanting to connect for rpc
@@ -50,6 +61,9 @@ somewhat easier stuff
 
 * regionserver exit and expiration need to be finished in ServerManager
 
+  -- Mostly done.  Need to also implement server shutdown again -- St.Ack 20100815
+
+
 * jsp pages borked
 
 * make sync calls for enable/disable (check and verify methods?)
@@ -142,6 +156,8 @@ Later:
 
 * renaming master file manager?  MasterFS/MasterFileSystem
 
+  -- I renamed this stuff -- St.Ack 20100815
+
 * ServerStatus/MasterStatus
 
   + We now have:  Abortable as the base class (M, RS, and Client implement abort())
@@ -156,6 +172,12 @@ Later:
     the server status booleans and abort() methods (like closed, closing,
     abortRequested)
 
+  -- Done.  I removed MasterStatus/MasterController.  Not necessary.
+  The RSController was renamed RegionServer.  Not the best but until 
+  something better.  I got rid of a few of the calls it was doing
+  as they didn't seem needed (e.g. call openHRegion, a static, rather
+  than do an open on the passed regionservercontroller) -- St.Ack 20100815
+
 * HBaseEventHandler/HBaseEventType/HBaseExecutorService
 
   X (done) After ZK changes, renamed to EventHandler/EventType

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Server.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Server.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Server.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/Server.java Wed Aug 18 07:16:15 2010
@@ -20,7 +20,6 @@
 package org.apache.hadoop.hbase;
 
 import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.hbase.executor.ExecutorService;
 import org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher;
 
 /**
@@ -46,9 +45,4 @@ public interface Server extends Abortabl
    * @return unique server name
    */
   public String getServerName();
-
-  /**
-   * @return This servers executor service.
-   */
-  public ExecutorService getExecutorService();
 }
\ No newline at end of file

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java Wed Aug 18 07:16:15 2010
@@ -49,6 +49,7 @@ import org.apache.hadoop.hbase.catalog.C
 import org.apache.hadoop.hbase.catalog.MetaReader;
 import org.apache.hadoop.hbase.catalog.RootLocationEditor;
 import org.apache.hadoop.hbase.client.MetaScanner;
+import org.apache.hadoop.hbase.executor.ExecutorService;
 import org.apache.hadoop.hbase.executor.RegionTransitionData;
 import org.apache.hadoop.hbase.master.LoadBalancer.RegionPlan;
 import org.apache.hadoop.hbase.master.handler.ClosedRegionHandler;
@@ -111,6 +112,8 @@ public class AssignmentManager extends Z
 
   private final ReentrantLock assignLock = new ReentrantLock();
 
+  private final ExecutorService executorService;
+
   /**
    * Constructs a new assignment manager.
    *
@@ -119,13 +122,15 @@ public class AssignmentManager extends Z
    * @param status master status
    * @param serverManager
    * @param catalogTracker
+   * @param service
    */
   public AssignmentManager(Server master, ServerManager serverManager,
-      CatalogTracker catalogTracker) {
+      CatalogTracker catalogTracker, final ExecutorService service) {
     super(master.getZooKeeper());
     this.master = master;
     this.serverManager = serverManager;
     this.catalogTracker = catalogTracker;
+    this.executorService = service;
     Configuration conf = master.getConfiguration();
     this.timeoutMonitor = new TimeoutMonitor(
         conf.getInt("hbase.master.assignment.timeoutmonitor.period", 30000),
@@ -265,7 +270,7 @@ public class AssignmentManager extends Z
             return;
           }
           // Handle CLOSED by assigning elsewhere or stopping if a disable
-          this.master.getExecutorService().submit(new ClosedRegionHandler(master,
+          this.executorService.submit(new ClosedRegionHandler(master,
             this, data, regionState.getRegion()));
           break;
 
@@ -296,16 +301,8 @@ public class AssignmentManager extends Z
                 "in expected PENDING_OPEN or OPENING states");
             return;
           }
-          // If this is a catalog table, update catalog manager accordingly
-          // Moving root and meta editing over to RS who does the opening
-          LOG.debug("Processing OPENED for region " +
-            regionState.getRegion().getRegionNameAsString());
-
-          // Used to have updating of root/meta locations here but it's
-          // automatic in CatalogTracker now
-
           // Handle OPENED by removing from transition and deleted zk node
-          this.master.getExecutorService().submit(
+          this.executorService.submit(
             new OpenedRegionHandler(master, this, data, regionState.getRegion(),
               this.serverManager.getServerInfo(data.getServerName())));
           break;

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/HMaster.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/HMaster.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Wed Aug 18 07:16:15 2010
@@ -156,7 +156,7 @@ implements HMasterInterface, HMasterRegi
   private volatile boolean abort = false;
 
   // Instance of the hbase executor service.
-  private ExecutorService service;
+  private ExecutorService executorService;
 
   /**
    * Initializes the HMaster. The steps are as follows:
@@ -165,15 +165,14 @@ implements HMasterInterface, HMasterRegi
    * <li>Initialize HMaster RPC and address
    * <li>Connect to ZooKeeper and figure out if this is a fresh cluster start or
    *     a failed over master
+   * <li>Block until becoming active master
    * <li>Initialize master components - server manager, region manager, metrics,
    *     region server queue, file system manager, etc
-   * <li>Block until becoming active master
    * </ol>
    * @throws InterruptedException 
    */
-  public HMaster(Configuration conf)
+  public HMaster(final Configuration conf)
   throws IOException, KeeperException, InterruptedException {
-    // initialize some variables
     this.conf = conf;
 
     /*
@@ -191,7 +190,7 @@ implements HMasterInterface, HMasterRegi
 
     /*
      * 2. Determine if this is a fresh cluster startup or failed over master.
-     *  This is done by checking for the existence of any ephemeral
+     * This is done by checking for the existence of any ephemeral
      * RegionServer nodes in ZooKeeper.  These nodes are created by RSs on
      * their initialization but only after they find the primary master.  As
      * long as this check is done before we write our address into ZK, this
@@ -199,29 +198,13 @@ implements HMasterInterface, HMasterRegi
      * startup (none have become active master yet), which is why there is an
      * additional check if this master does not become primary on its first attempt.
      */
-    zooKeeper = new ZooKeeperWatcher(conf, MASTER + "-" + getHServerAddress(), this);
-    clusterStarter = 0 == ZKUtil.getNumberOfChildren(zooKeeper, zooKeeper.rsZNode);
-
-    /*
-     * 3. Initialize master components.
-     * This includes the filesystem manager, server manager, region manager,
-     * metrics, queues, sleeper, etc...
-     */
-    // TODO: Do this using Dependency Injection, using PicoContainer or Spring.
-    this.connection = ServerConnectionManager.getConnection(conf);
-    this.metrics = new MasterMetrics(this.getName());
-    fileSystemManager = new MasterFileSystem(this);
-    serverManager = new ServerManager(this, this.connection, metrics,
-      fileSystemManager);
-    regionServerTracker = new RegionServerTracker(zooKeeper, this,
-      serverManager);
-    catalogTracker = new CatalogTracker(zooKeeper, connection, this,
-      conf.getInt("hbase.master.catalog.timeout", -1));
-    assignmentManager = new AssignmentManager(this, serverManager, catalogTracker);
-    clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
+    this.zooKeeper =
+      new ZooKeeperWatcher(conf, MASTER + "-" + getMasterAddress(), this);
+    this.clusterStarter = 0 ==
+      ZKUtil.getNumberOfChildren(zooKeeper, zooKeeper.rsZNode);
 
     /*
-     * 4. Block on becoming the active master.
+     * 3. Block on becoming the active master.
      * We race with other masters to write our address into ZooKeeper.  If we
      * succeed, we are the primary/active master and finish initialization.
      *
@@ -231,20 +214,41 @@ implements HMasterInterface, HMasterRegi
      */
     activeMasterManager = new ActiveMasterManager(zooKeeper, address, this);
     zooKeeper.registerListener(activeMasterManager);
-    zooKeeper.registerListener(assignmentManager);
+
     // Wait here until we are the active master
     clusterStarter = activeMasterManager.blockUntilBecomingActiveMaster();
 
-    // TODO: We should start everything here instead of before we become
-    //       active master and some after.  Requires change to RS side to not
-    //       start until clusterStatus is up rather than master is available.
+    /**
+     * 4. We are active master now... go initialize components we need to run.
+     */
+    // TODO: Do this using Dependency Injection, using PicoContainer or Spring.
+    this.metrics = new MasterMetrics(this.getName());
+    this.fileSystemManager = new MasterFileSystem(this);
+    this.connection = ServerConnectionManager.getConnection(conf);
+    this.executorService = new ExecutorService(getServerName());
+
+    this.serverManager = new ServerManager(this, this.connection, metrics,
+      fileSystemManager, this.executorService);
+
+    this.catalogTracker = new CatalogTracker(zooKeeper, connection, this,
+      conf.getInt("hbase.master.catalog.timeout", -1));
+    this.catalogTracker.start();
 
-    // We are the active master now.
+    this.assignmentManager = new AssignmentManager(this, serverManager,
+      this.catalogTracker, this.executorService);
+    zooKeeper.registerListener(assignmentManager);
+
+    this.regionServerTracker = new RegionServerTracker(zooKeeper, this,
+      this.serverManager);
     regionServerTracker.start();
-    catalogTracker.start();
-    clusterStatusTracker.setClusterUp();
 
-    LOG.info("Server active/primary master; " + this.address);
+    // Set the cluster as up.
+    this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
+    this.clusterStatusTracker.setClusterUp();
+    this.clusterStatusTracker.start();
+
+    LOG.info("Server active/primary master; " + this.address +
+      "; clusterStarter=" + this.clusterStarter);
   }
 
   /**
@@ -258,48 +262,19 @@ implements HMasterInterface, HMasterRegi
   @Override
   public void run() {
     try {
-      if (this.clusterStarter) {
-        // This master is starting the cluster (its not a preexisting cluster
-        // that this master is joining).
-        // Initialize the filesystem, which does the following:
-        //   - Creates the root hbase directory in the FS if DNE
-        //   - If fresh start, create first ROOT and META regions (bootstrap)
-        //   - Checks the FS to make sure the root directory is readable
-        //   - Creates the archive directory for logs
-        fileSystemManager.initialize();
-        // Do any log splitting necessary
-        // TODO: Should do this in background rather than block master startup
-        // TODO: Do we want to do this before/while/after RSs check in?
-        //       It seems that this method looks at active RSs but happens
-        //       concurrently with when we expect them to be checking in
-        fileSystemManager.splitLogAfterStartup(serverManager.getOnlineServers());
-      }
       // start up all service threads.
       startServiceThreads();
       // wait for minimum number of region servers to be up
-      serverManager.waitForMinServers();
-
+      this.serverManager.waitForMinServers();
       // start assignment of user regions, startup or failure
       if (this.clusterStarter) {
-        // Clean out current state of unassigned
-        assignmentManager.cleanoutUnassigned();
-        // assign the root region
-        assignmentManager.assignRoot();
-        catalogTracker.waitForRoot();
-        // assign the meta region
-        assignmentManager.assignMeta();
-        catalogTracker.waitForMeta();
-        // above check waits for general meta availability but this does not
-        // guarantee that the transition has completed
-        assignmentManager.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO);
-        assignmentManager.assignAllUserRegions();
+        clusterStarterInitializations(this.fileSystemManager,
+            this.serverManager, this.catalogTracker, this.assignmentManager);
       } else {
         // Process existing unassigned nodes in ZK, read all regions from META,
         // rebuild in-memory state.
-        assignmentManager.processFailover();
+        this.assignmentManager.processFailover();
       }
-      LOG.info("HMaster started on " + this.address.toString() +
-        "; clusterStarter=" + this.clusterStarter);
       // Check if we should stop every second.
       Sleeper sleeper = new Sleeper(1000, this);
       while (!this.stopped  && !this.abort) {
@@ -327,12 +302,45 @@ implements HMasterInterface, HMasterRegi
     this.rpcServer.stop();
     this.activeMasterManager.stop();
     this.zooKeeper.close();
-    this.service.shutdown();
+    this.executorService.shutdown();
     LOG.info("HMaster main thread exiting");
   }
 
-  public HServerAddress getHServerAddress() {
-    return address;
+  /*
+   * Initializations we need to do if we are cluster starter.
+   * @param starter
+   * @param mfs
+   * @throws IOException 
+   */
+  private static void clusterStarterInitializations(final MasterFileSystem mfs,
+    final ServerManager sm, final CatalogTracker ct, final AssignmentManager am)
+  throws IOException, InterruptedException, KeeperException {
+      // This master is starting the cluster (its not a preexisting cluster
+      // that this master is joining).
+      // Initialize the filesystem, which does the following:
+      //   - Creates the root hbase directory in the FS if DNE
+      //   - If fresh start, create first ROOT and META regions (bootstrap)
+      //   - Checks the FS to make sure the root directory is readable
+      //   - Creates the archive directory for logs
+      mfs.initialize();
+      // Do any log splitting necessary
+      // TODO: Should do this in background rather than block master startup
+      // TODO: Do we want to do this before/while/after RSs check in?
+      //       It seems that this method looks at active RSs but happens
+      //       concurrently with when we expect them to be checking in
+      mfs.splitLogAfterStartup(sm.getOnlineServers());
+      // Clean out current state of unassigned
+      am.cleanoutUnassigned();
+      // assign the root region
+      am.assignRoot();
+      ct.waitForRoot();
+      // assign the meta region
+      am.assignMeta();
+      ct.waitForMeta();
+      // above check waits for general meta availability but this does not
+      // guarantee that the transition has completed
+      am.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO);
+      am.assignAllUserRegions();
   }
 
   /*
@@ -351,7 +359,7 @@ implements HMasterInterface, HMasterRegi
 
   /** @return HServerAddress of the master server */
   public HServerAddress getMasterAddress() {
-    return getHServerAddress();
+    return this.address;
   }
 
   public long getProtocolVersion(String protocol, long clientVersion) {
@@ -380,11 +388,6 @@ implements HMasterInterface, HMasterRegi
     return this.zooKeeper;
   }
 
-  @Override
-  public ExecutorService getExecutorService() {
-    return this.service;
-  }
-
   /*
    * Start up all services. If any of these threads gets an unhandled exception
    * then they just die with a logged message.  This should be fine because
@@ -395,14 +398,13 @@ implements HMasterInterface, HMasterRegi
   private void startServiceThreads() {
     try {
       // Start the executor service pools
-      this.service = new ExecutorService(getServerName());
-      this.service.startExecutorService(ExecutorType.MASTER_OPEN_REGION,
+      this.executorService.startExecutorService(ExecutorType.MASTER_OPEN_REGION,
         conf.getInt("hbase.master.executor.openregion.threads", 5));
-      this.service.startExecutorService(ExecutorType.MASTER_CLOSE_REGION,
+      this.executorService.startExecutorService(ExecutorType.MASTER_CLOSE_REGION,
         conf.getInt("hbase.master.executor.closeregion.threads", 5));
-      this.service.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,
+      this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,
         conf.getInt("hbase.master.executor.serverops.threads", 5));
-      this.service.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
+      this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
         conf.getInt("hbase.master.executor.tableops.threads", 5));
 
       // Put up info server.
@@ -413,7 +415,7 @@ implements HMasterInterface, HMasterRegi
         this.infoServer.setAttribute(MASTER, this);
         this.infoServer.start();
       }
-      // Start the server so everything else is running before we start
+      // Start the server last so everything else is running before we start
       // receiving requests.
       this.rpcServer.start();
       if (LOG.isDebugEnabled()) {
@@ -671,7 +673,7 @@ implements HMasterInterface, HMasterRegi
   public void modifyTable(final byte[] tableName, HTableDescriptor htd)
   throws IOException {
     LOG.info("modifyTable(SET_HTD): " + htd);
-    this.service.submit(new ModifyTableHandler(tableName, this, catalogTracker,
+    this.executorService.submit(new ModifyTableHandler(tableName, this, catalogTracker,
       fileSystemManager));
   }
 

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java Wed Aug 18 07:16:15 2010
@@ -44,6 +44,7 @@ import org.apache.hadoop.hbase.Server;
 import org.apache.hadoop.hbase.Stoppable;
 import org.apache.hadoop.hbase.YouAreDeadException;
 import org.apache.hadoop.hbase.client.ServerConnection;
+import org.apache.hadoop.hbase.executor.ExecutorService;
 import org.apache.hadoop.hbase.ipc.HRegionInterface;
 import org.apache.hadoop.hbase.master.handler.ServerShutdownHandler;
 import org.apache.hadoop.hbase.master.metrics.MasterMetrics;
@@ -103,6 +104,8 @@ public class ServerManager {
 
   private final ServerConnection connection;
 
+  private final ExecutorService executorService;
+
   /**
    * Dumps into log current stats on dead servers and number of servers
    * TODO: Make this a metric; dump metrics into log.
@@ -144,14 +147,17 @@ public class ServerManager {
    * @param master
    * @param masterMetrics If null, we won't pass metrics.
    * @param masterFileSystem
+   * @param service ExecutorService instance.
    */
   public ServerManager(Server master,
       final ServerConnection connection,
       MasterMetrics masterMetrics,
-      MasterFileSystem masterFileSystem) {
+      MasterFileSystem masterFileSystem,
+      ExecutorService service) {
     this.master = master;
     this.masterMetrics = masterMetrics;
     this.connection = connection;
+    this.executorService = service;
     Configuration c = master.getConfiguration();
     int metaRescanInterval = c.getInt("hbase.master.meta.thread.rescanfrequency",
       60 * 1000);
@@ -483,7 +489,7 @@ public class ServerManager {
     }
     // Add to dead servers and queue a shutdown processing.
     this.deadServers.add(serverName);
-    this.master.getExecutorService().submit(new ServerShutdownHandler(master));
+    this.executorService.submit(new ServerShutdownHandler(master));
     LOG.debug("Added=" + serverName +
       " to dead servers, submitted shutdown handler to be executed");
   }

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java Wed Aug 18 07:16:15 2010
@@ -80,7 +80,7 @@ public class OpenedRegionHandler extends
 
   @Override
   public void process() {
-    LOG.debug("Handling OPENED event with data: " + data);
+    LOG.debug("Handling OPENED event; deleting unassigned node with data: " + data);
     // TODO: should we check if this table was disabled and get it closed?
     // Remove region from in-memory transition and unassigned node from ZK
     try {
@@ -90,6 +90,6 @@ public class OpenedRegionHandler extends
       server.abort("Error deleting OPENED node in ZK", e);
     }
     assignmentManager.regionOnline(regionInfo, serverInfo);
-    LOG.debug("Opened region " + regionInfo);
+    LOG.debug("Opened region " + regionInfo.getRegionNameAsString());
   }
 }

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Wed Aug 18 07:16:15 2010
@@ -58,9 +58,7 @@ import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hbase.Chore;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HConstants;
-import org.apache.hadoop.hbase.HConstants.OperationStatusCode;
 import org.apache.hadoop.hbase.HMsg;
-import org.apache.hadoop.hbase.HMsg.Type;
 import org.apache.hadoop.hbase.HRegionInfo;
 import org.apache.hadoop.hbase.HRegionLocation;
 import org.apache.hadoop.hbase.HServerAddress;
@@ -75,6 +73,8 @@ import org.apache.hadoop.hbase.Stoppable
 import org.apache.hadoop.hbase.UnknownRowLockException;
 import org.apache.hadoop.hbase.UnknownScannerException;
 import org.apache.hadoop.hbase.YouAreDeadException;
+import org.apache.hadoop.hbase.HConstants.OperationStatusCode;
+import org.apache.hadoop.hbase.HMsg.Type;
 import org.apache.hadoop.hbase.catalog.CatalogTracker;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
@@ -109,8 +109,8 @@ import org.apache.hadoop.hbase.util.Info
 import org.apache.hadoop.hbase.util.Pair;
 import org.apache.hadoop.hbase.util.Sleeper;
 import org.apache.hadoop.hbase.util.Threads;
+import org.apache.hadoop.hbase.zookeeper.ClusterStatusTracker;
 import org.apache.hadoop.hbase.zookeeper.ZKUtil;
-import org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker;
 import org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher;
 import org.apache.hadoop.io.MapWritable;
 import org.apache.hadoop.io.Writable;
@@ -228,7 +228,7 @@ public class HRegionServer implements HR
   private CatalogTracker catalogTracker;
 
   // Cluster Status Tracker
-  private ZooKeeperNodeTracker clusterShutdownTracker;
+  private ClusterStatusTracker clusterStatusTracker;
 
   // A sleeper that sleeps for msgInterval.
   private final Sleeper sleeper;
@@ -268,14 +268,11 @@ public class HRegionServer implements HR
     // use will be in #serverInfo data member. For example, we may have been
     // passed a port of 0 which means we should pick some ephemeral port to bind
     // to.
-    address = new HServerAddress(addressStr);
-    LOG.info("My address is " + address);
+    this.address = new HServerAddress(addressStr);
 
-    this.abortRequested = false;
     this.fsOk = true;
     this.conf = conf;
     this.connection = ServerConnectionManager.getConnection(conf);
-
     this.isOnline = false;
 
     // Config'ed params
@@ -290,7 +287,7 @@ public class HRegionServer implements HR
         HConstants.HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY,
         HConstants.DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE);
 
-    // Task thread to process requests from Master
+    // Task thread to process requests from Master.  TODO: REMOVE
     this.worker = new Worker();
 
     this.numRegionsToReport = conf.getInt(
@@ -332,7 +329,7 @@ public class HRegionServer implements HR
     }
     initializeZooKeeper();
     initializeThreads();
-    int nbBlocks = 0; // FIXXX conf.getInt("hbase.regionserver.nbreservationblocks", 4);
+    int nbBlocks = 0; // TODO: FIX WAS OOME'ing in TESTS ->  conf.getInt("hbase.regionserver.nbreservationblocks", 4);
     for (int i = 0; i < nbBlocks; i++) {
       reservedSpace.add(new byte[HConstants.DEFAULT_SIZE_RESERVATION_BLOCK]);
     }
@@ -353,27 +350,16 @@ public class HRegionServer implements HR
         conf.getInt("hbase.regionserver.catalog.timeout", -1));
     catalogTracker.start();
 
-    this.clusterShutdownTracker = new ZooKeeperNodeTracker(this.zooKeeper,
-        this.zooKeeper.clusterStateZNode, this) {
-      @Override
-      public synchronized void nodeDeleted(String path) {
-        super.nodeDeleted(path);
-        if (isClusterShutdown()) {
-          // Cluster was just marked for shutdown.
-          LOG.info("Received cluster shutdown message");
-          closeUserRegions(false);
-        }
-      }
-    };
-    this.zooKeeper.registerListener(this.clusterShutdownTracker);
-    this.clusterShutdownTracker.start();
+    this.clusterStatusTracker = new ClusterStatusTracker(this.zooKeeper, this);
+    this.clusterStatusTracker.start();
+    this.clusterStatusTracker.blockUntilAvailable();
   }
 
   /**
    * @return True if cluster shutdown in progress
    */
-  private boolean isClusterShutdown() {
-    return this.clusterShutdownTracker.getData() == null;
+  private boolean isClusterUp() {
+    return this.clusterStatusTracker.isClusterUp();
   }
 
   private void initializeThreads() throws IOException {
@@ -407,7 +393,8 @@ public class HRegionServer implements HR
    * load/unload instructions.
    */
   public void run() {
-    regionServerThread = Thread.currentThread();
+    this.regionServerThread = Thread.currentThread();
+    boolean calledCloseUserRegions = false;
     try {
       while (!this.stopped) {
         if (tryReportForDuty()) break;
@@ -416,9 +403,13 @@ public class HRegionServer implements HR
       List<HMsg> outboundMessages = new ArrayList<HMsg>();
       // The main run loop.
       for (int tries = 0; !this.stopped && isHealthy();) {
-        if (isClusterShutdown() && this.onlineRegions.isEmpty()) {
-          stop("Exiting; cluster shutdown set and not carrying any regions");
-          continue;
+        if (!isClusterUp()) {
+          if (this.onlineRegions.isEmpty()) {
+            stop("Exiting; cluster shutdown set and not carrying any regions");
+          } else if (!calledCloseUserRegions) {
+            closeUserRegions(this.abortRequested);
+            calledCloseUserRegions = true;
+          }
         }
         // Try to get the root region location from zookeeper.
         checkRootRegionLocation();
@@ -592,21 +583,6 @@ public class HRegionServer implements HR
     return hsl;
   }
 
-  /**
-   * @return True if successfully invoked {@link #reportForDuty()}
-   * @throws IOException
-   */
-  private boolean tryReportForDuty() throws IOException {
-    MapWritable w = reportForDuty();
-    if (w != null) {
-      init(w);
-      return true;
-    }
-    sleeper.sleep();
-    LOG.warn("No response on reportForDuty. Sleeping and then retrying.");
-    return false;
-  }
-
   private void checkRootRegionLocation() throws InterruptedException {
     if (this.haveRootRegion.get()) return;
     HServerAddress rootServer = catalogTracker.getRootLocation();
@@ -691,7 +667,7 @@ public class HRegionServer implements HR
    *
    * @param c Extra configuration.
    */
-  protected void init(final MapWritable c) throws IOException {
+  protected void handleReportForDutyResponse(final MapWritable c) throws IOException {
     try {
       for (Map.Entry<Writable, Writable> e : c.entrySet()) {
         String key = e.getKey().toString();
@@ -716,7 +692,7 @@ public class HRegionServer implements HR
         HServerAddress hsa = new HServerAddress(hra, this.serverInfo
             .getServerAddress().getPort());
         LOG.info("Master passed us address to use. Was="
-            + this.serverInfo.getServerAddress() + ", Now=" + hra);
+            + this.serverInfo.getServerAddress() + ", Now=" + hsa.toString());
         this.serverInfo.setServerAddress(hsa);
       }
       // Master sent us hbase.rootdir to use. Should be fully qualified
@@ -1109,7 +1085,7 @@ public class HRegionServer implements HR
   @Override
   public void stop(final String msg) {
     this.stopped = true;
-    LOG.info(msg);
+    LOG.info("STOPPED: " + msg);
     synchronized (this) {
       // Wakes run() if it is sleeping
       notifyAll(); // FindBugs NN_NAKED_NOTIFY
@@ -1206,6 +1182,21 @@ public class HRegionServer implements HR
     return true;
   }
 
+  /**
+   * @return True if successfully invoked {@link #reportForDuty()}
+   * @throws IOException
+   */
+  private boolean tryReportForDuty() throws IOException {
+    MapWritable w = reportForDuty();
+    if (w != null) {
+      handleReportForDutyResponse(w);
+      return true;
+    }
+    sleeper.sleep();
+    LOG.warn("No response on reportForDuty. Sleeping and then retrying.");
+    return false;
+  }
+
   /*
    * Let the master know we're here Run initialization using parameters passed
    * us by the master.
@@ -1221,18 +1212,10 @@ public class HRegionServer implements HR
     while (!stopped) {
       try {
         this.requestCount.set(0);
-        MemoryUsage memory = ManagementFactory.getMemoryMXBean()
-            .getHeapMemoryUsage();
-        HServerLoad hsl = new HServerLoad(0,
-            (int) memory.getUsed() / 1024 / 1024,
-            (int) memory.getMax() / 1024 / 1024);
-        this.serverInfo.setLoad(hsl);
-        if (LOG.isDebugEnabled()) {
-          LOG.debug("sending initial server load: " + hsl);
-        }
         lastMsg = System.currentTimeMillis();
         ZKUtil.setAddressAndWatch(zooKeeper, ZKUtil.joinZNode(
             zooKeeper.rsZNode, ZKUtil.getNodeName(serverInfo)), address);
+        this.serverInfo.setLoad(buildServerLoad());
         result = this.hbaseMaster.regionServerStartup(this.serverInfo);
         break;
       } catch (IOException e) {
@@ -2295,11 +2278,6 @@ public class HRegionServer implements HR
   public String getServerName() {
     return serverInfo.getServerName();
   }
- 
-  @Override
-  public ExecutorService getExecutorService() {
-    return this.service;
-  }
 
   @Override
   public CompactionRequestor getCompactionRequester() {

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java Wed Aug 18 07:16:15 2010
@@ -22,9 +22,17 @@ package org.apache.hadoop.hbase.zookeepe
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.hbase.Abortable;
+import org.apache.hadoop.hbase.ClusterStatus;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.zookeeper.KeeperException;
 
+/**
+ * Tracker on cluster settings up in zookeeper.
+ * This is not related to {@link ClusterStatus}.  That class is a data structure
+ * that holds snapshot of current view on cluster.  This class is about tracking
+ * cluster attributes up in zookeeper.
+ *
+ */
 public class ClusterStatusTracker extends ZooKeeperNodeTracker {
   private static final Log LOG = LogFactory.getLog(ClusterStatusTracker.class);
 
@@ -37,11 +45,17 @@ public class ClusterStatusTracker extend
    * @param abortable
    */
   public ClusterStatusTracker(ZooKeeperWatcher watcher, Abortable abortable) {
-    super(watcher, watcher.rootServerZNode, abortable);
+    super(watcher, watcher.clusterStateZNode, abortable);
+  }
+
+  @Override
+  public synchronized void start() {
+    super.start();
+    this.watcher.registerListener(this);
   }
 
   /**
-   * Checks if the root region location is available.
+   * Checks if cluster is up.
    * @return true if root region location is available, false if not
    */
   public boolean isClusterUp() {

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java Wed Aug 18 07:16:15 2010
@@ -313,7 +313,7 @@ public class ZKAssign {
       EventType expectedState)
   throws KeeperException, KeeperException.NoNodeException {
     zkw.debug("Deleting an existing unassigned node for " + regionName +
-        " that is in a " + expectedState + " state");
+        " that is in expected state " + expectedState);
     String node = getNodeName(zkw, regionName);
     Stat stat = new Stat();
     byte [] bytes = ZKUtil.getDataNoWatch(zkw, node, stat);

Modified: hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java?rev=986584&r1=986583&r2=986584&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java (original)
+++ hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java Wed Aug 18 07:16:15 2010
@@ -182,8 +182,8 @@ public class MiniHBaseCluster {
     }
 
     @Override
-    protected void init(MapWritable c) throws IOException {
-      super.init(c);
+    protected void handleReportForDutyResponse(MapWritable c) throws IOException {
+      super.handleReportForDutyResponse(c);
       // Run this thread to shutdown our filesystem on way out.
       this.shutdownThread = new SingleFileSystemShutdownThread(getFileSystem());
     }