You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2010/08/31 02:14:14 UTC

svn commit: r991041 - in /hbase/branches/0.90_master_rewrite: ./ src/main/java/org/apache/hadoop/hbase/catalog/ src/main/java/org/apache/hadoop/hbase/master/handler/ src/test/java/org/apache/hadoop/hbase/master/

Author: stack
Date: Tue Aug 31 00:14:13 2010
New Revision: 991041

URL: http://svn.apache.org/viewvc?rev=991041&view=rev
Log:

M BRANCH_TODO.txt
  Update to current state.
M src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java
  Set these tests to ignore until we redo.
M src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
  Add fixup of case where daughters are not added before crash.
M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
  Return full row when we ask for server regions.  We need full row
  doing fixup during server crash processing.
M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
  ServerInfo can be legitimately null.

Modified:
    hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
    hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
    hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java

Modified: hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt (original)
+++ hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt Tue Aug 31 00:14:13 2010
@@ -5,50 +5,31 @@ implemented.
 remaining tasks before merge
 ---
 
-* finish baseline implementation of new splits
--- Basic split works now.  RS opens daughters on itself.
-I made the mistake of keeping up state
-in zk at first but thats not necessary; at moment if split fails
-we kill the regionserver rather than have a hole in our table.
-TODO: Come back and review after merge to make sure this jibes
-w/ new split transaction code.  St.Ack 20100823.
-
-* integrate load balancer
-- Looksee if we are still deleting location from meta; not needed any
-more and if we don't delete, then we can put region back on the server
-that used to be serving it; can add old location to new RegionPlan
--- St.Ack 08/21
-
-* ensure root/meta are last to close on cluster shutdown
-- Add asking RS what it has when only two servers remaining...
-and when only root or meta, then send explicit close of each.
-Do it this way to ensure correct shutdown order -- St.Ack 08/21
-
-
 ---
 tasks to complete post merge
 ---
 
 * move client to use CatalogTracker and add region admin methods
++ Yes.
 
 * bulletproof splits.  need to be recoverable from every point including
-  partial META edits over on RS.
+  partial META edits over on RS
++ Should be there.  Add more tests.  -- St.Ack 20100901
 
-  
 * review timeout semantics for client calls.  servers should generally wait
   forever on root/meta but client class need to eventually timeout.
   
   we need to document new configuration parameters as well since this will now
   be a 'timeout' rather than 'retries' and 'delay'.
 
+ TODO: Remove configs that no longer apply -- St.Ack 20100901
+
 * finish rewriting or making any existing failing unit tests pass
 
 * new master unit tests (failover, failing RS and Master during various points
   of regions in transition, etc)
 
 
-
-
 harder stuff
 ---
 
@@ -60,27 +41,9 @@ harder stuff
   -- Should never timeout IMO and we changed executors so root and meta are
   done separately so this should be ok? -- St.Ack 20100815
 
-* move splits to RS side, integrate new patch from stack on trunk
-  might need a new CREATED unassigned now, or new rpc, but get rid of sending
-  split notification on heartbeat?
-  how to handle splits concurrent with disable?
-
-  -- We need means of fixup if only one edit goes in.. the offlining of parent.
-  St.Ack 20100817
-  -- This should be in place; rs opens daughters on itself now.
-  St.Ack 20100823.
-
-* figure what to do with client table admin ops (flush, split, compact)
-  (direct to RS rpc calls are in place, need to update client)
-  
-  -- And then remove this stuff from HMsg -- St.Ack 20100815
-
 * on region open (and wherever split children notify master) should check if
   if the table is disabled and should close the regions... maybe.
 
-* there are some races with master wanting to connect for rpc
-  to regionserver and the rs starting its rpc server, need to address
-
 * figure how to handle the very rare but possible race condition where two
   RSs will update META and the later one can squash the valid one if there was
   a long gc pause
@@ -88,8 +51,8 @@ harder stuff
 * review synchronization in AssignmentManager
 
 * migrate TestMasterTransitions or make new?
-
-* fix or remove last couple master tests that used RSOQ
+  
+  Make a new one -- St.Ack 20100901
 
 * write new tests!!!
 
@@ -97,18 +60,12 @@ harder stuff
 somewhat easier stuff
 ---
 
-* regionserver exit and expiration need to be finished in ServerManager
-
-  -- Mostly done.  Need to also implement server shutdown again -- St.Ack 20100815
-  -- Whats missing is servershutdownhandler. St.Ack 20100817
-
-
 * jsp pages borked
 
 * make sync calls for enable/disable (check and verify methods?)
   this still needs some love and testing but should be much easier to control now
 
-* integrate load balancing
+* Add balancing unit tests (was integrate balancer -- done. St.Ack 20100901)
   implemented but need to start a thread or chore, each time, wait for no
   regions in transition, generate and iterate the plan, putting it in-memory
   and then triggering the assignment.  if the master crashes mid-balance,
@@ -123,16 +80,6 @@ somewhat easier stuff
   possibly migrate client to use CatalogTracker?
 
 
-* Executor services need to be using a priority queue
-
-  >>  Done.  I think all stuff to set pool size and add priorities is in.
-  -- Interestingly, if we mess up transitions... shutdown can be hung as
-  executors that are outstanding without matching other-ends will be
-  stuck trying to remove elements from queue... server won't go down.
-  St.Ack 20100817
-
-
-
 St.Ack
  -- Ensure root and meta are last to close on cluster shutdown; it shoudl be the case but verify.
 
@@ -294,3 +241,8 @@ Later:
 TODO:
 + Add test to prove move region works.
 + Add test to prove enable/disable balancer works.
++ Add test for fixup if daughter edits don't make it into .META. (should be fixed up as part of server shutdown processing).
++ ensure root/meta are last to close on cluster shutdown
+- Add asking RS what it has when only two servers remaining...
+and when only root or meta, then send explicit close of each.
+Do it this way to ensure correct shutdown order -- St.Ack 08/21

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java Tue Aug 31 00:14:13 2010
@@ -96,7 +96,7 @@ public class MetaEditor {
     byte [] catalogRegionName = CatalogTracker.META_REGION;
     Put put = new Put(regionInfo.getRegionName());
     addRegionInfo(put, regionInfo);
-    addLocation(put, serverInfo);
+    if (serverInfo != null) addLocation(put, serverInfo);
     server.put(catalogRegionName, put);
     LOG.info("Added daughter " + regionInfo.getRegionNameAsString() +
       " in region " + Bytes.toString(catalogRegionName) + " with " +

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java Tue Aug 31 00:14:13 2010
@@ -23,6 +23,7 @@ import java.io.IOException;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;
+import java.util.NavigableMap;
 import java.util.NavigableSet;
 import java.util.TreeMap;
 import java.util.TreeSet;
@@ -46,7 +47,6 @@ import org.apache.hadoop.hbase.util.Writ
  * catalogs.
  */
 public class MetaReader {
-
   /**
    * Performs a full scan of <code>.META.</code>.
    * <p>
@@ -166,10 +166,9 @@ public class MetaReader {
   public static Pair<HRegionInfo, HServerAddress> metaRowToRegionPair(
       Result data) throws IOException {
     HRegionInfo info = Writables.getHRegionInfo(
-        data.getValue(HConstants.CATALOG_FAMILY,
-            HConstants.REGIONINFO_QUALIFIER));
+      data.getValue(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER));
     final byte[] value = data.getValue(HConstants.CATALOG_FAMILY,
-        HConstants.SERVER_QUALIFIER);
+      HConstants.SERVER_QUALIFIER);
     if (value != null && value.length > 0) {
       HServerAddress server = new HServerAddress(Bytes.toString(value));
       return new Pair<HRegionInfo,HServerAddress>(info, server);
@@ -283,23 +282,24 @@ public class MetaReader {
     }
   }
 
-  public static NavigableSet<HRegionInfo>
+  public static NavigableMap<HRegionInfo, Result>
   getServerRegions(CatalogTracker catalogTracker, final HServerInfo hsi)
   throws IOException {
     HRegionInterface metaServer =
       catalogTracker.waitForMetaServerConnectionDefault();
-    NavigableSet<HRegionInfo> hris = new TreeSet<HRegionInfo>();
+    NavigableMap<HRegionInfo, Result> hris = new TreeMap<HRegionInfo, Result>();
     Scan scan = new Scan();
     scan.addFamily(HConstants.CATALOG_FAMILY);
     long scannerid = metaServer.openScanner(
         HRegionInfo.FIRST_META_REGIONINFO.getRegionName(), scan);
     try {
-      Result data;
-      while((data = metaServer.next(scannerid)) != null) {
-        if (data != null && data.size() > 0) {
-          Pair<HRegionInfo, HServerAddress> pair = metaRowToRegionPair(data);
-          if (!pair.getSecond().equals(hsi.getServerAddress())) continue;
-          hris.add(pair.getFirst());
+      Result result;
+      while((result = metaServer.next(scannerid)) != null) {
+        if (result != null && result.size() > 0) {
+          HRegionInfo hri = Writables.getHRegionInfo(
+            result.getValue(HConstants.CATALOG_FAMILY,
+              HConstants.REGIONINFO_QUALIFIER));
+          hris.put(hri, result);
         }
       }
       return hris;
@@ -307,4 +307,4 @@ public class MetaReader {
       metaServer.close(scannerid);
     }
   }
-}
+}
\ No newline at end of file

Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java Tue Aug 31 00:14:13 2010
@@ -20,18 +20,23 @@
 package org.apache.hadoop.hbase.master.handler;
 
 import java.io.IOException;
-import java.util.NavigableSet;
+import java.util.Map;
+import java.util.NavigableMap;
 
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hbase.HConstants;
 import org.apache.hadoop.hbase.HRegionInfo;
 import org.apache.hadoop.hbase.HServerInfo;
 import org.apache.hadoop.hbase.Server;
+import org.apache.hadoop.hbase.catalog.MetaEditor;
 import org.apache.hadoop.hbase.catalog.MetaReader;
+import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.executor.EventHandler;
 import org.apache.hadoop.hbase.master.DeadServer;
 import org.apache.hadoop.hbase.master.MasterServices;
 import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.Writables;
 import org.apache.zookeeper.KeeperException;
 
 
@@ -98,20 +103,21 @@ public class ServerShutdownHandler exten
       throw new IOException("Interrupted", e);
     }
 
-    NavigableSet<HRegionInfo> hris =
+    NavigableMap<HRegionInfo, Result> hris =
       MetaReader.getServerRegions(this.server.getCatalogTracker(), this.hsi);
     LOG.info("Reassigning the " + hris.size() + " region(s) that " + serverName +
       " was carrying.");
 
     // We should encounter -ROOT- and .META. first in the Set given how its
     // a sorted set.
-    for (HRegionInfo hri: hris) {
+    for (Map.Entry<HRegionInfo, Result> e: hris.entrySet()) {
       // If table is not disabled but the region is offlined,
+      HRegionInfo hri = e.getKey();
       boolean disabled = this.services.getAssignmentManager().
         isTableDisabled(hri.getTableDesc().getNameAsString());
       if (disabled) continue;
-      if (hri.isOffline()) {
-        LOG.warn("TODO: DO FIXUP ON OFFLINED PARENT? REGION OFFLINE -- IS THIS RIGHT?" + hri);
+      if (hri.isOffline() && hri.isSplit()) {
+        fixupDaughters(hris, e.getValue());
         continue;
       }
       this.services.getAssignmentManager().assign(hri);
@@ -119,4 +125,36 @@ public class ServerShutdownHandler exten
     this.deadServers.remove(serverName);
     LOG.info("Finished processing of shutdown of " + serverName);
   }
+
+  /**
+   * Check that daughter regions are up in .META. and if not, add them.
+   * @param hris All regions for this server in meta.
+   * @param result The contents of the parent row in .META.
+   * @throws IOException
+   */
+  void fixupDaughters(final NavigableMap<HRegionInfo, Result> hris,
+      final Result result) throws IOException {
+    fixupDaughter(hris, result, HConstants.SPLITA_QUALIFIER);
+    fixupDaughter(hris, result, HConstants.SPLITB_QUALIFIER);
+  }
+
+  /**
+   * Check individual daughter is up in .META.; fixup if its not.
+   * @param hris All regions for this server in meta.
+   * @param result The contents of the parent row in .META.
+   * @param qualifier Which daughter to check for.
+   * @throws IOException
+   */
+  void fixupDaughter(final NavigableMap<HRegionInfo, Result> hris,
+      final Result result, final byte [] qualifier)
+  throws IOException {
+    byte [] bytes = result.getValue(HConstants.CATALOG_FAMILY, qualifier);
+    if (bytes == null || bytes.length <= 0) return;
+    HRegionInfo hri = Writables.getHRegionInfo(bytes);
+    if (!hris.containsKey(hri)) {
+      LOG.info("Fixup; missing daughter " + hri.getEncodedNameAsBytes());
+      MetaEditor.addDaughter(this.server.getCatalogTracker(), hri, null);
+      this.services.getAssignmentManager().assign(hri);
+    }
+  }
 }
\ No newline at end of file

Modified: hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java (original)
+++ hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java Tue Aug 31 00:14:13 2010
@@ -37,6 +37,8 @@ import org.junit.AfterClass;
 import org.junit.Assert;
 import org.junit.Before;
 import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
 
 /**
  * Test transitions of state across the master.  Sets up the cluster once and
@@ -187,9 +189,9 @@ public class TestMasterTransitions {
    * in.
    * @see <a href="https://issues.apache.org/jira/browse/HBASE-2428">HBASE-2428</a> 
    */
-/*
-  @Test (timeout=300000) public void testRegionCloseWhenNoMetaHBase2428()
+  @Ignore @Test  (timeout=300000) public void testRegionCloseWhenNoMetaHBase2428()
   throws Exception {
+    /*
     LOG.info("Running testRegionCloseWhenNoMetaHBase2428");
     MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
     final HMaster master = cluster.getMaster();
@@ -233,17 +235,18 @@ public class TestMasterTransitions {
       master.getRegionServerOperationQueue().
         unregisterRegionServerOperationListener(listener);
     }
+    */
   }
-*/
+
   /**
    * Test adding in a new server before old one on same host+port is dead.
    * Make the test more onerous by having the server under test carry the meta.
    * If confusion between old and new, purportedly meta never comes back.  Test
    * that meta gets redeployed.
    */
-  /*
-  @Test (timeout=300000) public void testAddingServerBeforeOldIsDead2413()
+  @Ignore @Test (timeout=300000) public void testAddingServerBeforeOldIsDead2413()
   throws IOException {
+    /*
     LOG.info("Running testAddingServerBeforeOldIsDead2413");
     MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
     int count = count();
@@ -283,8 +286,8 @@ public class TestMasterTransitions {
     } finally {
       c.set(HConstants.REGIONSERVER_PORT, oldPort);
     }
+    */
   }
-*/
 
   /**
    * HBase2482 is about outstanding region openings.  If any are outstanding
@@ -368,8 +371,9 @@ public class TestMasterTransitions {
    * done.
    * @see <a href="https://issues.apache.org/jira/browse/HBASE-2482">HBASE-2482</a> 
    */
-  /*@Test (timeout=300000) *//*public void testKillRSWithOpeningRegion2482()
+  @Ignore @Test (timeout=300000) public void testKillRSWithOpeningRegion2482()
   throws Exception {
+    /*
     LOG.info("Running testKillRSWithOpeningRegion2482");
     MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
     if (cluster.getLiveRegionServerThreads().size() < 2) {
@@ -413,8 +417,9 @@ public class TestMasterTransitions {
       m.getRegionServerOperationQueue().
         unregisterRegionServerOperationListener(listener);
     }
+    */
   }
-*/
+
   /*
    * @return Count of all non-catalog regions on the designated server
    */