You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/11/30 07:07:00 UTC

[jira] [Commented] (HBASE-21534) TestAssignmentManager is flakey

    [ https://issues.apache.org/jira/browse/HBASE-21534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704339#comment-16704339 ] 

Duo Zhang commented on HBASE-21534:
-----------------------------------

Look at the log, it seems that we assign meta to the server which is already marked as dead. This should be a test issue, let me provide a fix.

{noformat}
2018-11-29 20:47:49,751 INFO  [PEWorker-63] procedure.ServerCrashProcedure(131): Start pid=14, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,751 DEBUG [PEWorker-63] procedure2.RootProcedureState(153): Add procedure pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=true; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true as the 0th rollback step
2018-11-29 20:47:49,751 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1, xlock=false sharedLock=0 size=0) to run queue because: pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true released exclusive lock
2018-11-29 20:47:49,751 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1, xlock=false sharedLock=0 size=1) to run queue because: the exclusive lock is not held by anyone when adding pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,751 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(361): Remove ServerQueue(localhost,102,1, xlock=false sharedLock=0 size=0) from run queue because: queue is empty after polling out pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,751 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(361): Remove ServerQueue(localhost,102,1, xlock=true (14) sharedLock=0 size=0) from run queue because: pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true held exclusive lock
2018-11-29 20:47:49,751 DEBUG [PEWorker-63] procedure.ServerCrashProcedure(207): Splitting meta WALs pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=true; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,755 INFO  [PEWorker-63] master.MasterWalManager(314): Log dir for server localhost,102,1 does not exist
2018-11-29 20:47:49,755 INFO  [PEWorker-63] master.SplitLogManager(465): dead splitlog workers [localhost,102,1]
2018-11-29 20:47:49,755 INFO  [PEWorker-63] master.SplitLogManager(297): Finished splitting (more than or equal to) 0 bytes in 0 log files in [] in 0ms
2018-11-29 20:47:49,756 DEBUG [PEWorker-63] procedure.ServerCrashProcedure(213): Done splitting meta WALs pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_META_LOGS, hasLock=true; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,756 DEBUG [PEWorker-63] procedure2.RootProcedureState(153): Add procedure pid=14, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=true; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true as the 1th rollback step
2018-11-29 20:47:49,756 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1, xlock=false sharedLock=0 size=0) to run queue because: pid=14, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true released exclusive lock
2018-11-29 20:47:49,756 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1, xlock=false sharedLock=0 size=1) to run queue because: the exclusive lock is not held by anyone when adding pid=14, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,756 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(361): Remove ServerQueue(localhost,102,1, xlock=false sharedLock=0 size=0) from run queue because: queue is empty after polling out pid=14, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true
2018-11-29 20:47:49,756 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(361): Remove ServerQueue(localhost,102,1, xlock=true (14) sharedLock=0 size=0) from run queue because: pid=14, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true held exclusive lock
2018-11-29 20:47:49,758 INFO  [PEWorker-63] procedure2.ProcedureExecutor(1680): Initialized subprocedures=[{pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN}]
2018-11-29 20:47:49,758 DEBUG [PEWorker-63] procedure2.RootProcedureState(153): Add procedure pid=14, state=WAITING:SERVER_CRASH_GET_REGIONS, hasLock=true; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true as the 2th rollback step
2018-11-29 20:47:49,758 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(351): Add TableQueue(hbase:meta, xlock=false sharedLock=0 size=1) to run queue because: the exclusive lock is not held by anyone when adding pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2018-11-29 20:47:49,758 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(351): Add ServerQueue(localhost,102,1, xlock=false sharedLock=0 size=0) to run queue because: pid=14, state=WAITING:SERVER_CRASH_GET_REGIONS, hasLock=false; ServerCrashProcedure server=localhost,102,1, splitWal=false, meta=true released exclusive lock
2018-11-29 20:47:49,758 DEBUG [PEWorker-63] procedure.MasterProcedureScheduler(361): Remove TableQueue(hbase:meta, xlock=false sharedLock=0 size=0) from run queue because: queue is empty after polling out pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2018-11-29 20:47:49,758 INFO  [PEWorker-63] procedure.MasterProcedureScheduler(736): Took xlock for pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2018-11-29 20:47:49,758 INFO  [PEWorker-63] assignment.TransitRegionStateProcedure(166): Starting pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; rit=OPEN, location=localhost,102,1; forceNewPlan=false, retain=false
2018-11-29 20:47:49,759 DEBUG [PEWorker-63] procedure2.RootProcedureState(153): Add procedure pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN as the 3th rollback step
2018-11-29 20:47:49,909 INFO  [mockmaster:1234] balancer.BaseLoadBalancer(1532): Reassigned 1 regions. 1 retained the pre-restart assignment. 
2018-11-29 20:47:49,909 DEBUG [mockmaster:1234] procedure.MasterProcedureScheduler(351): Add TableQueue(hbase:meta, xlock=false sharedLock=1 size=1) to run queue because: pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN has lock
2018-11-29 20:47:49,909 DEBUG [PEWorker-36] procedure.MasterProcedureScheduler(361): Remove TableQueue(hbase:meta, xlock=false sharedLock=1 size=0) from run queue because: queue is empty after polling out pid=15, ppid=14, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2018-11-29 20:47:49,910 INFO  [PEWorker-36] procedure2.ProcedureExecutor(1680): Initialized subprocedures=[{pid=16, ppid=15, state=RUNNABLE, hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
2018-11-29 20:47:49,910 DEBUG [PEWorker-36] procedure2.RootProcedureState(153): Add procedure pid=15, ppid=14, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN as the 4th rollback step
2018-11-29 20:47:49,910 DEBUG [PEWorker-36] procedure.MasterProcedureScheduler(351): Add TableQueue(hbase:meta, xlock=false sharedLock=1 size=1) to run queue because: the exclusive lock is not held by anyone when adding pid=16, ppid=15, state=RUNNABLE, hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
2018-11-29 20:47:49,910 DEBUG [PEWorker-36] procedure.MasterProcedureScheduler(361): Remove TableQueue(hbase:meta, xlock=false sharedLock=1 size=0) from run queue because: queue is empty after polling out pid=16, ppid=15, state=RUNNABLE, hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
2018-11-29 20:47:49,910 DEBUG [PEWorker-36] procedure2.RootProcedureState(153): Add procedure pid=16, ppid=15, state=RUNNABLE, hasLock=true; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 5th rollback step
2018-11-29 20:47:50,061 WARN  [MockRSProcedureDispatcher-pool5-t10] assignment.AssignmentManager(894): The region server localhost,102,1 is already dead, skip reportRegionStateTransition call
{noformat}

> TestAssignmentManager is flakey
> -------------------------------
>
>                 Key: HBASE-21534
>                 URL: https://issues.apache.org/jira/browse/HBASE-21534
>             Project: HBase
>          Issue Type: Task
>            Reporter: Duo Zhang
>            Priority: Major
>
> See this in the outout and then the test hang
> {noformat}
> 2018-11-29 20:47:50,061 WARN  [MockRSProcedureDispatcher-pool5-t10] assignment.AssignmentManager(894): The region server localhost,102,1 is already dead, skip reportRegionStateTransition call
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)