You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2022/03/19 02:07:00 UTC

[jira] [Created] (HBASE-26866) Shutdown WAL may abort region server

Duo Zhang created HBASE-26866:
---------------------------------

             Summary: Shutdown WAL may abort region server
                 Key: HBASE-26866
                 URL: https://issues.apache.org/jira/browse/HBASE-26866
             Project: HBase
          Issue Type: Bug
            Reporter: Duo Zhang


https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3140/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestSyncReplicationActive-output.txt

TestSyncReplicationAcive is flaky because of we may abort the region server when shutting down WAL.
{noformat}
2022-03-18T04:50:37,205 WARN  [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=36877] master.MasterRpcServices(682): jenkins-hbase13.apache.org,33377,1647579008859 reported a fatal error:
***** ABORTING region server jenkins-hbase13.apache.org,33377,1647579008859: Log rolling failed *****
Cause:
java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$681/1458648270@37209753 rejected from java.util.concurrent.ThreadPoolExecutor@69662eb7[Shutting down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
	at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.cleanOldLogs(AbstractFSWAL.java:773)
	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:935)
	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$8(AbstractFSWAL.java:953)
	at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:196)
	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:953)
	at org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:316)
	at org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:214)
{noformat}

The problem here is that, the removal of WAL is async, when shuttting down the WAL, we will close the thread pool so it will throw reject execution exception and cause region server abort.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)