You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jimmy Xiang (JIRA)" <ji...@apache.org> on 2013/10/16 04:43:44 UTC

[jira] [Comment Edited] (HBASE-9773) Master aborted when hbck asked the master to assign a region that was already online

    [ https://issues.apache.org/jira/browse/HBASE-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796341#comment-13796341 ] 

Jimmy Xiang edited comment on HBASE-9773 at 10/16/13 2:43 AM:
--------------------------------------------------------------

[~saint.ack@gmail.com], if it is null, the state is managed by the caller so we leave it to the caller. If the transitionInZK is not set, the caller is just to do the best to close the region just in case to avoid double assignment. So we don't check the actual state here.


was (Author: jxiang):
[~saint.ack@gmail.com], if it is null, the state is managed by the caller so we leave it to the caller.

> Master aborted when hbck asked the master to assign a region that was already online
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-9773
>                 URL: https://issues.apache.org/jira/browse/HBASE-9773
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>            Assignee: Jimmy Xiang
>         Attachments: trunk-9773.patch, trunk-9773_v2.patch
>
>
> Came across this situation (with a version of 0.96 very close to RC5 version created on 10/11):
> The sequence of events that happened:
> 1. The hbck tool couldn't communicate with the RegionServer hosting namespace region due to some security exceptions. hbck INCORRECTLY assumed the region was not deployed.
> In output.log (client side):
> {noformat}
> 2013-10-12 10:42:57,067|beaver.machine|INFO|ERROR: Region { meta => hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., hdfs => hdfs://gs-hdp2-secure-1381559462-hbase-12.cs1cloud.internal:8020/apps/hbase/data/data/hbase/namespace/a0ac0825ba2d0830614e7f808f31787a, deployed =>  } not deployed on any region server.
> 2013-10-12 10:42:57,067|beaver.machine|INFO|Trying to fix unassigned region...
> {noformat}
> 2. This led to the hbck tool trying to tell the master to "assign" the region.
> In master log (hbase-hbase-master-gs-hdp2-secure-1381559462-hbase-12.log):
> {noformat}
> 2013-10-12 10:52:35,960 INFO  [RpcServer.handler=4,port=60000] master.HMaster: Client=hbase//172.18.145.105 assign hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 3. The master went through the steps - sent a CLOSE to the RegionServer hosting namespace region.
> From master log:
> {noformat}
> 2013-10-12 10:52:35,981 DEBUG [RpcServer.handler=4,port=60000] master.AssignmentManager: Sent CLOSE to gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794 for region hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 4. The master then tried to assign the namespace region to a region server, and in the process ABORTED:
> From master log:
> {noformat}
> 2013-10-12 10:52:36,025 DEBUG [RpcServer.handler=4,port=60000] master.AssignmentManager: No previous transition plan found (or ignoring an existing plan) for hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.; generated random plan=hri=hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., src=, dest=gs-hdp2-secure-1381559462-hbase-9.cs1cloud.internal,60020,1381564439807; 4 (online=4, available=4) available servers, forceNewPlan=true
> 2013-10-12 10:52:36,026 FATAL [RpcServer.handler=4,port=60000] master.HMaster: Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController]
> 2013-10-12 10:52:36,027 FATAL [RpcServer.handler=4,port=60000] master.HMaster: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794} .. Cannot transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794} .. Cannot transit it to OFFLINE.
> {noformat}
> {code}AssignmentManager.assign(HRegionInfo region, boolean setOfflineInZK, boolean forceNewPlan){code} is the method that does all the above. This was called from the HMaster with true for both the boolean arguments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)