You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2015/08/13 01:13:46 UTC
[jira] [Updated] (HBASE-14129) If any regionserver gets shutdown
uncleanly during full cluster restart, locality looks to be lost
[ https://issues.apache.org/jira/browse/HBASE-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell updated HBASE-14129:
-----------------------------------
Fix Version/s: (was: 0.98.14)
0.98.15
1.3.0
Status: Open (was: Patch Available)
{code}
diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
index f7f98fe..1c3ceee 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
@@ -539,6 +539,12 @@ public class AssignmentManager {
LOG.info("Clean cluster startup. Assigning user regions");
assignAllUserRegions(allRegions);
}
+
+ if (this.server.getConfiguration().getBoolean("hbase.full.cluster.start", false)) {
+ // Hint to do a full cluster startup cluster startup.
+ LOG.info("Clean cluster startup forced via parameterized startup. Assigning user regions");
+ assignAllUserRegions(allRegions);
+ }
// unassign replicas of the split parents and the merged regions
// the daughter replicas are opened in assignAllUserRegions if it was
// not already opened.
{code}
Can someone who knows the AM better take a quick peek if this is sufficient?
> If any regionserver gets shutdown uncleanly during full cluster restart, locality looks to be lost
> --------------------------------------------------------------------------------------------------
>
> Key: HBASE-14129
> URL: https://issues.apache.org/jira/browse/HBASE-14129
> Project: HBase
> Issue Type: Bug
> Reporter: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.15
>
> Attachments: HBASE-14129.patch
>
>
> We were doing a cluster restart the other day. Some regionservers did not shut down cleanly. Upon restart our locality went from 99% to 5%. Upon looking at the AssignmentManager.joinCluster() code it calls AssignmentManager.processDeadServersAndRegionsInTransition().
> If the failover flag gets set for any reason it seems we don't call assignAllUserRegions(). Then it looks like the balancer does the work in assigning those regions, we don't use a locality aware balancer and we lost our region locality.
> I don't have a solid grasp on the reasoning for these checks but there could be some potential workarounds here.
> 1. After shutting down your cluster, move your WALs aside (replay later).
> 2. Clean up your zNodes
> That seems to work, but requires a lot of manual labor. Another solution which I prefer would be to have a flag for ./start-hbase.sh --clean
> If we start master with that flag then we do a check in AssignmentManager.processDeadServersAndRegionsInTransition() thus if this flag is set we call: assignAllUserRegions() regardless of the failover state.
> I have a patch for the later solution, that is if I am understanding the logic correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)