You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Peter Somogyi (JIRA)" <ji...@apache.org> on 2019/02/08 13:20:00 UTC

[jira] [Commented] (HBASE-21854) Race condition in TestProcedureSkipPersistence

    [ https://issues.apache.org/jira/browse/HBASE-21854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763582#comment-16763582 ] 

Peter Somogyi commented on HBASE-21854:
---------------------------------------

End of the log for a failing case has these lines:
{noformat}
2019-02-06 14:18:11,218 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159): CODE-BUG: Uncaught runtime exception for pid=1, state=FAILED, locked=true, exception=java.io.IOException via Should have persisted the timeout value:java.io.IOException: Should have persisted the timeout value; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure
java.lang.UnsupportedOperationException
   at org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure.rollback(TestProcedureSkipPersistence.java:109)
   at org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure.rollback(TestProcedureSkipPersistence.java:62)
   at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:982)
   at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1663)
   at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1595)
   at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1438)
   at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78)
   at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2058)
2019-02-06 14:18:11,219 INFO  [PEWorker-1] procedure2.ProcedureExecutor(1625): Rolled back pid=1, state=ROLLEDBACK, exception=java.io.IOException via Should have persisted the timeout value:java.io.IOException: Should have persisted the timeout value; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure exec-time=247msec
2019-02-06 14:18:11,306 INFO  [Time-limited test] procedure2.ProcedureExecutor(702): Stopping
2019-02-06 14:18:11,307 INFO  [Time-limited test] wal.WALProcedureStore(331): Stopping the WAL Procedure Store, isAbort=false{noformat}
The IOException is thrown from TestProcedure's execute method.

Steps:
 # Procedure(a) added with timeout=3600000 with no persistence.
 # ProcedureExecutor restarted
 # Procedure(b) restored, no timeout restored; set timeout=2000 with persistence
 # ProcedureExecutor restarted
 # Procedure(c) restored, should have timeout

When the test fails Step #3 and #4 are running in parallel so Procedure(b) it not started/persisted completely. After ProcedureExecutor is started (step 4) it discovers the procedure but it does not have persisted timeout on it. Is it still the original Procedure(a)?

[~Apache9], do you think it is a test or a production issue?

> Race condition in TestProcedureSkipPersistence 
> -----------------------------------------------
>
>                 Key: HBASE-21854
>                 URL: https://issues.apache.org/jira/browse/HBASE-21854
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.1.3
>            Reporter: Peter Somogyi
>            Assignee: Peter Somogyi
>            Priority: Major
>
> There is a race condition in TestProcedureSkipPersistence. After the procedure is added, the test stops ProcedureExecutor. In some cases the procedure is not added to the queue in time.
> Failing execution:
> {noformat}
> 2019-02-06 14:18:11,133 INFO  [Time-limited test] procedure2.TimeoutExecutorThread(82): ADDED pid=-1, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner; timeout=30000, timestamp=1549491521133
> 2019-02-06 14:18:11,135 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure; timeout=2000, timestamp=1549491493135
> 2019-02-06 14:18:11,137 INFO  [Time-limited test] hbase.Waiter(189): Waiting up to [30,000] milli-secs(wait.for.ratio=[1])
> 2019-02-06 14:18:11,139 INFO  [Time-limited test] procedure2.ProcedureTestingUtility(125): RESTART - Stop
> 2019-02-06 14:18:11,139 INFO  [Time-limited test] procedure2.ProcedureExecutor(702): Stopping
> 2019-02-06 14:18:11,139 INFO  [Time-limited test] wal.WALProcedureStore(331): Stopping the WAL Procedure Store, isAbort=false
> 2019-02-06 14:18:11,140 DEBUG [PEWorker-1] procedure2.RootProcedureState(153): Add procedure pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure as the 0th rollback step
> 2019-02-06 14:18:11,141 WARN  [PEWorker-1] procedure2.ProcedureExecutor$WorkerThread(2074): Worker terminating UNNATURALLY null
> java.lang.RuntimeException: the store must be running before inserting data
>    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:710)
>    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.update(WALProcedureStore.java:603)
>    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.updateStoreOnExec(ProcedureExecutor.java:1943)
>    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1809)
>    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1481)
>    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78)
>    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2058)
> 2019-02-06 14:18:11,145 INFO  [Time-limited test] procedure2.ProcedureTestingUtility(137): RESTART - Start{noformat}
> In a successful run the ProcExecutor is stopped AFTER the procedure is actually in the queue.
> Successful:
> {noformat}
> 2019-02-07 15:48:08,731 INFO  [Time-limited test] procedure2.TimeoutExecutorThread(82): ADDED pid=-1, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner; timeout=30000, timestamp=1549550918731
> 2019-02-07 15:48:08,731 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure; timeout=2000, timestamp=1549550890731
> 2019-02-07 15:48:08,732 DEBUG [PEWorker-1] procedure2.RootProcedureState(153): Add procedure pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure as the 0th rollback step
> 2019-02-07 15:48:08,732 INFO  [WALProcedureStoreSyncThread] wal.WALProcedureStore(1217): Remove all state logs with ID less than 1, since all the active procedures are in the latest log
> 2019-02-07 15:48:08,733 DEBUG [WALProcedureStoreSyncThread] wal.WALProcedureStore(1239): Removed log=file:/Users/peter.somogyi/Cloudera/hbase/hbase-procedure/target/test-data/b9a1969a-85a4-15e8-7da5-6198f5acf2de/proc-logs/pv2-00000000000000000001.log, activeLogs=[/Users/peter.somogyi/Cloudera/hbase/hbase-procedure/target/test-data/b9a1969a-85a4-15e8-7da5-6198f5acf2de/proc-logs/pv2-00000000000000000002.log]
> 2019-02-07 15:48:08,734 INFO  [Time-limited test] hbase.Waiter(189): Waiting up to [30,000] milli-secs(wait.for.ratio=[1])
> 2019-02-07 15:48:08,734 INFO  [Time-limited test] procedure2.ProcedureTestingUtility(125): RESTART - Stop
> 2019-02-07 15:48:08,734 INFO  [Time-limited test] procedure2.ProcedureExecutor(702): Stopping
> 2019-02-07 15:48:08,735 INFO  [Time-limited test] wal.WALProcedureStore(331): Stopping the WAL Procedure Store, isAbort=false
> 2019-02-07 15:48:08,735 INFO  [Time-limited test] procedure2.ProcedureTestingUtility(137): RESTART - Start{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)