You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/02/21 13:13:00 UTC

[jira] [Created] (HBASE-20038) TestLockProcedure.testTimeout is flakey

Duo Zhang created HBASE-20038:
---------------------------------

             Summary: TestLockProcedure.testTimeout is flakey
                 Key: HBASE-20038
                 URL: https://issues.apache.org/jira/browse/HBASE-20038
             Project: HBase
          Issue Type: Sub-task
            Reporter: Duo Zhang


The test is simple so it is easy to find out the problem.
{noformat}
2018-02-21 04:53:32,230 INFO  [Time-limited test] hbase.ResourceChecker(148): before: master.locking.TestLockProcedure#testTimeout Thread=218, OpenFileDescriptor=853, MaxFileDescriptor=60000, SystemLoadAverage=5075, ProcessCount=312, AvailableMemoryMB=5373
2018-02-21 04:53:32,234 WARN  [Time-limited test] procedure2.ProcedureTestingUtility(146): Set Kill before store update to: false
2018-02-21 04:53:32,278 DEBUG [Time-limited test] procedure2.ProcedureExecutor(866): Stored pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE
2018-02-21 04:53:32,285 DEBUG [Time-limited test] locking.LockProcedure(195): Heartbeat pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE
2018-02-21 04:53:32,286 DEBUG [PEWorker-1] locking.LockProcedure(312): LOCKED pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE
2018-02-21 04:53:32,303 DEBUG [Time-limited test] locking.TestLockProcedure(204): Proc id 14 acquired lock.
2018-02-21 04:53:32,298 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE; timeout=2000, timestamp=1519188814298
2018-02-21 04:53:33,303 DEBUG [Time-limited test] locking.LockProcedure(195): Heartbeat pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE
2018-02-21 04:53:33,304 DEBUG [Time-limited test] locking.TestLockProcedure(225): Proc id 14 : LOCKED.
2018-02-21 04:53:34,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207): Timeout failure ProcedureEvent for pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE]
2018-02-21 04:53:34,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210): Calling wake on ProcedureEvent for pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE]
2018-02-21 04:53:34,299 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE; timeout=2000, timestamp=1519188816299
2018-02-21 04:53:34,306 DEBUG [Time-limited test] locking.LockProcedure(195): Heartbeat pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE
2018-02-21 04:53:34,306 DEBUG [Time-limited test] locking.TestLockProcedure(225): Proc id 14 : LOCKED.
2018-02-21 04:53:34,766 WARN  [HBase-Metrics2-1] impl.MetricsConfig(125): Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
2018-02-21 04:53:36,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207): Timeout failure ProcedureEvent for pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE]
2018-02-21 04:53:36,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210): Calling wake on ProcedureEvent for pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE]
2018-02-21 04:53:36,299 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE; timeout=2000, timestamp=1519188818299
2018-02-21 04:53:38,299 DEBUG [ProcExecTimeout] locking.LockProcedure(207): Timeout failure ProcedureEvent for pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE]
2018-02-21 04:53:38,299 DEBUG [ProcExecTimeout] locking.LockProcedure(210): Calling wake on ProcedureEvent for pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE, ready=false, [pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE]
2018-02-21 04:53:38,310 DEBUG [Time-limited test] locking.LockProcedure(195): Heartbeat pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE
2018-02-21 04:53:38,310 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=14, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE; timeout=2000, timestamp=1519188820310
2018-02-21 04:53:38,311 WARN  [Time-limited test] procedure2.ProcedureTestingUtility(146): Set Kill before store update to: false
2018-02-21 04:53:38,342 DEBUG [PEWorker-1] locking.LockProcedure(242): UNLOCKED pid=14, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE
2018-02-21 04:53:38,366 INFO  [PEWorker-1] procedure2.ProcedureExecutor(1247): Finished pid=14, state=SUCCESS; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE in 6.0920sec
2018-02-21 04:53:38,628 INFO  [Time-limited test] hbase.ResourceChecker(172): after: master.locking.TestLockProcedure#testTimeout Thread=209 (was 218), OpenFileDescriptor=851 (was 853), MaxFileDescriptor=60000 (was 60000), SystemLoadAverage=5045 (was 5075), ProcessCount=314 (was 312) - ProcessCount LEAK? -, AvailableMemoryMB=5071 (was 5373)
{noformat}
The problem is that, sleep 2 * HEARTBEAT_TIMEOUT is not safe. It is possible that after the first TIMEOUT the procedure has not been timeout yet and then the check for unlocked and the actual timeout processing will arrive at almost the same time. You can see the above log, we are going to mark the procedure as timeout at 04:53:38,299 and the heartbeat for checking unlock arrives at 04:53:38,310.

I think increase the wait time can solve the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)