You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lijin Bin (Jira)" <ji...@apache.org> on 2019/12/23 12:17:00 UTC

[jira] [Updated] (HBASE-23613) ProcedureExecutor check StuckWorkers blocked by DeadServerMetricRegionChore

     [ https://issues.apache.org/jira/browse/HBASE-23613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lijin Bin updated HBASE-23613:
------------------------------
    Affects Version/s: 2.2.2
          Description: 
After debuging, i find WorkerMonitor in ProcedureExecutor do not execute for a while because it is blocked by DeadServerMetricRegionChore.
TimeoutExecutorThread execute not only WorkerMonitor, but also DeadServerMetricRegionChore RegionInTransitionChore...
{code}
"ProcExecTimeout" #1052 daemon prio=5 os_prio=0 tid=0x00007f5c98cc4000 nid=0x229 waiting on condition [0x00007f5c2f857000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000005c312ad80> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at org.apache.hadoop.hbase.master.assignment.RegionStateNode.lock(RegionStateNode.java:313)
        at org.apache.hadoop.hbase.master.assignment.AssignmentManager$DeadServerMetricRegionChore.periodicExecute(AssignmentManager.java:1186)
        at org.apache.hadoop.hbase.master.assignment.AssignmentManager$DeadServerMetricRegionChore.periodicExecute(AssignmentManager.java:1163)
        at org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeInMemoryChore(TimeoutExecutorThread.java:120)
        at org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:99)
        at org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:66)

"PEWorker-1" #1053 daemon prio=5 os_prio=0 tid=0x00007f5c98cc5800 nid=0x22a in Object.wait() [0x00007f5c2f756000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:168)
        - locked <0x00000005839f18b0> (a java.util.concurrent.atomic.AtomicBoolean)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:540)
        at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:209)
        at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:203)
        at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:141)
        at org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:1742)
        at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:298)
        at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
{code}

> ProcedureExecutor check StuckWorkers blocked by DeadServerMetricRegionChore
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-23613
>                 URL: https://issues.apache.org/jira/browse/HBASE-23613
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.2.2
>            Reporter: Lijin Bin
>            Assignee: Lijin Bin
>            Priority: Major
>
> After debuging, i find WorkerMonitor in ProcedureExecutor do not execute for a while because it is blocked by DeadServerMetricRegionChore.
> TimeoutExecutorThread execute not only WorkerMonitor, but also DeadServerMetricRegionChore RegionInTransitionChore...
> {code}
> "ProcExecTimeout" #1052 daemon prio=5 os_prio=0 tid=0x00007f5c98cc4000 nid=0x229 waiting on condition [0x00007f5c2f857000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000005c312ad80> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateNode.lock(RegionStateNode.java:313)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager$DeadServerMetricRegionChore.periodicExecute(AssignmentManager.java:1186)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager$DeadServerMetricRegionChore.periodicExecute(AssignmentManager.java:1163)
>         at org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeInMemoryChore(TimeoutExecutorThread.java:120)
>         at org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:99)
>         at org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:66)
> "PEWorker-1" #1053 daemon prio=5 os_prio=0 tid=0x00007f5c98cc5800 nid=0x22a in Object.wait() [0x00007f5c2f756000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:168)
>         - locked <0x00000005839f18b0> (a java.util.concurrent.atomic.AtomicBoolean)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:540)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:209)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:203)
>         at org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateRegionLocation(RegionStateStore.java:141)
>         at org.apache.hadoop.hbase.master.assignment.AssignmentManager.persistToMeta(AssignmentManager.java:1742)
>         at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:298)
>         at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
>         at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
>         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)