You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2010/08/31 02:14:14 UTC
svn commit: r991041 - in /hbase/branches/0.90_master_rewrite: ./
src/main/java/org/apache/hadoop/hbase/catalog/
src/main/java/org/apache/hadoop/hbase/master/handler/
src/test/java/org/apache/hadoop/hbase/master/
Author: stack
Date: Tue Aug 31 00:14:13 2010
New Revision: 991041
URL: http://svn.apache.org/viewvc?rev=991041&view=rev
Log:
M BRANCH_TODO.txt
Update to current state.
M src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java
Set these tests to ignore until we redo.
M src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
Add fixup of case where daughters are not added before crash.
M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
Return full row when we ask for server regions. We need full row
doing fixup during server crash processing.
M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
ServerInfo can be legitimately null.
Modified:
hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt
hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java
Modified: hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt (original)
+++ hbase/branches/0.90_master_rewrite/BRANCH_TODO.txt Tue Aug 31 00:14:13 2010
@@ -5,50 +5,31 @@ implemented.
remaining tasks before merge
---
-* finish baseline implementation of new splits
--- Basic split works now. RS opens daughters on itself.
-I made the mistake of keeping up state
-in zk at first but thats not necessary; at moment if split fails
-we kill the regionserver rather than have a hole in our table.
-TODO: Come back and review after merge to make sure this jibes
-w/ new split transaction code. St.Ack 20100823.
-
-* integrate load balancer
-- Looksee if we are still deleting location from meta; not needed any
-more and if we don't delete, then we can put region back on the server
-that used to be serving it; can add old location to new RegionPlan
--- St.Ack 08/21
-
-* ensure root/meta are last to close on cluster shutdown
-- Add asking RS what it has when only two servers remaining...
-and when only root or meta, then send explicit close of each.
-Do it this way to ensure correct shutdown order -- St.Ack 08/21
-
-
---
tasks to complete post merge
---
* move client to use CatalogTracker and add region admin methods
++ Yes.
* bulletproof splits. need to be recoverable from every point including
- partial META edits over on RS.
+ partial META edits over on RS
++ Should be there. Add more tests. -- St.Ack 20100901
-
* review timeout semantics for client calls. servers should generally wait
forever on root/meta but client class need to eventually timeout.
we need to document new configuration parameters as well since this will now
be a 'timeout' rather than 'retries' and 'delay'.
+ TODO: Remove configs that no longer apply -- St.Ack 20100901
+
* finish rewriting or making any existing failing unit tests pass
* new master unit tests (failover, failing RS and Master during various points
of regions in transition, etc)
-
-
harder stuff
---
@@ -60,27 +41,9 @@ harder stuff
-- Should never timeout IMO and we changed executors so root and meta are
done separately so this should be ok? -- St.Ack 20100815
-* move splits to RS side, integrate new patch from stack on trunk
- might need a new CREATED unassigned now, or new rpc, but get rid of sending
- split notification on heartbeat?
- how to handle splits concurrent with disable?
-
- -- We need means of fixup if only one edit goes in.. the offlining of parent.
- St.Ack 20100817
- -- This should be in place; rs opens daughters on itself now.
- St.Ack 20100823.
-
-* figure what to do with client table admin ops (flush, split, compact)
- (direct to RS rpc calls are in place, need to update client)
-
- -- And then remove this stuff from HMsg -- St.Ack 20100815
-
* on region open (and wherever split children notify master) should check if
if the table is disabled and should close the regions... maybe.
-* there are some races with master wanting to connect for rpc
- to regionserver and the rs starting its rpc server, need to address
-
* figure how to handle the very rare but possible race condition where two
RSs will update META and the later one can squash the valid one if there was
a long gc pause
@@ -88,8 +51,8 @@ harder stuff
* review synchronization in AssignmentManager
* migrate TestMasterTransitions or make new?
-
-* fix or remove last couple master tests that used RSOQ
+
+ Make a new one -- St.Ack 20100901
* write new tests!!!
@@ -97,18 +60,12 @@ harder stuff
somewhat easier stuff
---
-* regionserver exit and expiration need to be finished in ServerManager
-
- -- Mostly done. Need to also implement server shutdown again -- St.Ack 20100815
- -- Whats missing is servershutdownhandler. St.Ack 20100817
-
-
* jsp pages borked
* make sync calls for enable/disable (check and verify methods?)
this still needs some love and testing but should be much easier to control now
-* integrate load balancing
+* Add balancing unit tests (was integrate balancer -- done. St.Ack 20100901)
implemented but need to start a thread or chore, each time, wait for no
regions in transition, generate and iterate the plan, putting it in-memory
and then triggering the assignment. if the master crashes mid-balance,
@@ -123,16 +80,6 @@ somewhat easier stuff
possibly migrate client to use CatalogTracker?
-* Executor services need to be using a priority queue
-
- >> Done. I think all stuff to set pool size and add priorities is in.
- -- Interestingly, if we mess up transitions... shutdown can be hung as
- executors that are outstanding without matching other-ends will be
- stuck trying to remove elements from queue... server won't go down.
- St.Ack 20100817
-
-
-
St.Ack
-- Ensure root and meta are last to close on cluster shutdown; it shoudl be the case but verify.
@@ -294,3 +241,8 @@ Later:
TODO:
+ Add test to prove move region works.
+ Add test to prove enable/disable balancer works.
++ Add test for fixup if daughter edits don't make it into .META. (should be fixed up as part of server shutdown processing).
++ ensure root/meta are last to close on cluster shutdown
+- Add asking RS what it has when only two servers remaining...
+and when only root or meta, then send explicit close of each.
+Do it this way to ensure correct shutdown order -- St.Ack 08/21
Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java Tue Aug 31 00:14:13 2010
@@ -96,7 +96,7 @@ public class MetaEditor {
byte [] catalogRegionName = CatalogTracker.META_REGION;
Put put = new Put(regionInfo.getRegionName());
addRegionInfo(put, regionInfo);
- addLocation(put, serverInfo);
+ if (serverInfo != null) addLocation(put, serverInfo);
server.put(catalogRegionName, put);
LOG.info("Added daughter " + regionInfo.getRegionNameAsString() +
" in region " + Bytes.toString(catalogRegionName) + " with " +
Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java Tue Aug 31 00:14:13 2010
@@ -23,6 +23,7 @@ import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
+import java.util.NavigableMap;
import java.util.NavigableSet;
import java.util.TreeMap;
import java.util.TreeSet;
@@ -46,7 +47,6 @@ import org.apache.hadoop.hbase.util.Writ
* catalogs.
*/
public class MetaReader {
-
/**
* Performs a full scan of <code>.META.</code>.
* <p>
@@ -166,10 +166,9 @@ public class MetaReader {
public static Pair<HRegionInfo, HServerAddress> metaRowToRegionPair(
Result data) throws IOException {
HRegionInfo info = Writables.getHRegionInfo(
- data.getValue(HConstants.CATALOG_FAMILY,
- HConstants.REGIONINFO_QUALIFIER));
+ data.getValue(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER));
final byte[] value = data.getValue(HConstants.CATALOG_FAMILY,
- HConstants.SERVER_QUALIFIER);
+ HConstants.SERVER_QUALIFIER);
if (value != null && value.length > 0) {
HServerAddress server = new HServerAddress(Bytes.toString(value));
return new Pair<HRegionInfo,HServerAddress>(info, server);
@@ -283,23 +282,24 @@ public class MetaReader {
}
}
- public static NavigableSet<HRegionInfo>
+ public static NavigableMap<HRegionInfo, Result>
getServerRegions(CatalogTracker catalogTracker, final HServerInfo hsi)
throws IOException {
HRegionInterface metaServer =
catalogTracker.waitForMetaServerConnectionDefault();
- NavigableSet<HRegionInfo> hris = new TreeSet<HRegionInfo>();
+ NavigableMap<HRegionInfo, Result> hris = new TreeMap<HRegionInfo, Result>();
Scan scan = new Scan();
scan.addFamily(HConstants.CATALOG_FAMILY);
long scannerid = metaServer.openScanner(
HRegionInfo.FIRST_META_REGIONINFO.getRegionName(), scan);
try {
- Result data;
- while((data = metaServer.next(scannerid)) != null) {
- if (data != null && data.size() > 0) {
- Pair<HRegionInfo, HServerAddress> pair = metaRowToRegionPair(data);
- if (!pair.getSecond().equals(hsi.getServerAddress())) continue;
- hris.add(pair.getFirst());
+ Result result;
+ while((result = metaServer.next(scannerid)) != null) {
+ if (result != null && result.size() > 0) {
+ HRegionInfo hri = Writables.getHRegionInfo(
+ result.getValue(HConstants.CATALOG_FAMILY,
+ HConstants.REGIONINFO_QUALIFIER));
+ hris.put(hri, result);
}
}
return hris;
@@ -307,4 +307,4 @@ public class MetaReader {
metaServer.close(scannerid);
}
}
-}
+}
\ No newline at end of file
Modified: hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java (original)
+++ hbase/branches/0.90_master_rewrite/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java Tue Aug 31 00:14:13 2010
@@ -20,18 +20,23 @@
package org.apache.hadoop.hbase.master.handler;
import java.io.IOException;
-import java.util.NavigableSet;
+import java.util.Map;
+import java.util.NavigableMap;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.HRegionInfo;
import org.apache.hadoop.hbase.HServerInfo;
import org.apache.hadoop.hbase.Server;
+import org.apache.hadoop.hbase.catalog.MetaEditor;
import org.apache.hadoop.hbase.catalog.MetaReader;
+import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.executor.EventHandler;
import org.apache.hadoop.hbase.master.DeadServer;
import org.apache.hadoop.hbase.master.MasterServices;
import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.Writables;
import org.apache.zookeeper.KeeperException;
@@ -98,20 +103,21 @@ public class ServerShutdownHandler exten
throw new IOException("Interrupted", e);
}
- NavigableSet<HRegionInfo> hris =
+ NavigableMap<HRegionInfo, Result> hris =
MetaReader.getServerRegions(this.server.getCatalogTracker(), this.hsi);
LOG.info("Reassigning the " + hris.size() + " region(s) that " + serverName +
" was carrying.");
// We should encounter -ROOT- and .META. first in the Set given how its
// a sorted set.
- for (HRegionInfo hri: hris) {
+ for (Map.Entry<HRegionInfo, Result> e: hris.entrySet()) {
// If table is not disabled but the region is offlined,
+ HRegionInfo hri = e.getKey();
boolean disabled = this.services.getAssignmentManager().
isTableDisabled(hri.getTableDesc().getNameAsString());
if (disabled) continue;
- if (hri.isOffline()) {
- LOG.warn("TODO: DO FIXUP ON OFFLINED PARENT? REGION OFFLINE -- IS THIS RIGHT?" + hri);
+ if (hri.isOffline() && hri.isSplit()) {
+ fixupDaughters(hris, e.getValue());
continue;
}
this.services.getAssignmentManager().assign(hri);
@@ -119,4 +125,36 @@ public class ServerShutdownHandler exten
this.deadServers.remove(serverName);
LOG.info("Finished processing of shutdown of " + serverName);
}
+
+ /**
+ * Check that daughter regions are up in .META. and if not, add them.
+ * @param hris All regions for this server in meta.
+ * @param result The contents of the parent row in .META.
+ * @throws IOException
+ */
+ void fixupDaughters(final NavigableMap<HRegionInfo, Result> hris,
+ final Result result) throws IOException {
+ fixupDaughter(hris, result, HConstants.SPLITA_QUALIFIER);
+ fixupDaughter(hris, result, HConstants.SPLITB_QUALIFIER);
+ }
+
+ /**
+ * Check individual daughter is up in .META.; fixup if its not.
+ * @param hris All regions for this server in meta.
+ * @param result The contents of the parent row in .META.
+ * @param qualifier Which daughter to check for.
+ * @throws IOException
+ */
+ void fixupDaughter(final NavigableMap<HRegionInfo, Result> hris,
+ final Result result, final byte [] qualifier)
+ throws IOException {
+ byte [] bytes = result.getValue(HConstants.CATALOG_FAMILY, qualifier);
+ if (bytes == null || bytes.length <= 0) return;
+ HRegionInfo hri = Writables.getHRegionInfo(bytes);
+ if (!hris.containsKey(hri)) {
+ LOG.info("Fixup; missing daughter " + hri.getEncodedNameAsBytes());
+ MetaEditor.addDaughter(this.server.getCatalogTracker(), hri, null);
+ this.services.getAssignmentManager().assign(hri);
+ }
+ }
}
\ No newline at end of file
Modified: hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java?rev=991041&r1=991040&r2=991041&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java (original)
+++ hbase/branches/0.90_master_rewrite/src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java Tue Aug 31 00:14:13 2010
@@ -37,6 +37,8 @@ import org.junit.AfterClass;
import org.junit.Assert;
import org.junit.Before;
import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
/**
* Test transitions of state across the master. Sets up the cluster once and
@@ -187,9 +189,9 @@ public class TestMasterTransitions {
* in.
* @see <a href="https://issues.apache.org/jira/browse/HBASE-2428">HBASE-2428</a>
*/
-/*
- @Test (timeout=300000) public void testRegionCloseWhenNoMetaHBase2428()
+ @Ignore @Test (timeout=300000) public void testRegionCloseWhenNoMetaHBase2428()
throws Exception {
+ /*
LOG.info("Running testRegionCloseWhenNoMetaHBase2428");
MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
final HMaster master = cluster.getMaster();
@@ -233,17 +235,18 @@ public class TestMasterTransitions {
master.getRegionServerOperationQueue().
unregisterRegionServerOperationListener(listener);
}
+ */
}
-*/
+
/**
* Test adding in a new server before old one on same host+port is dead.
* Make the test more onerous by having the server under test carry the meta.
* If confusion between old and new, purportedly meta never comes back. Test
* that meta gets redeployed.
*/
- /*
- @Test (timeout=300000) public void testAddingServerBeforeOldIsDead2413()
+ @Ignore @Test (timeout=300000) public void testAddingServerBeforeOldIsDead2413()
throws IOException {
+ /*
LOG.info("Running testAddingServerBeforeOldIsDead2413");
MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
int count = count();
@@ -283,8 +286,8 @@ public class TestMasterTransitions {
} finally {
c.set(HConstants.REGIONSERVER_PORT, oldPort);
}
+ */
}
-*/
/**
* HBase2482 is about outstanding region openings. If any are outstanding
@@ -368,8 +371,9 @@ public class TestMasterTransitions {
* done.
* @see <a href="https://issues.apache.org/jira/browse/HBASE-2482">HBASE-2482</a>
*/
- /*@Test (timeout=300000) *//*public void testKillRSWithOpeningRegion2482()
+ @Ignore @Test (timeout=300000) public void testKillRSWithOpeningRegion2482()
throws Exception {
+ /*
LOG.info("Running testKillRSWithOpeningRegion2482");
MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
if (cluster.getLiveRegionServerThreads().size() < 2) {
@@ -413,8 +417,9 @@ public class TestMasterTransitions {
m.getRegionServerOperationQueue().
unregisterRegionServerOperationListener(listener);
}
+ */
}
-*/
+
/*
* @return Count of all non-catalog regions on the designated server
*/