You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2022/03/19 02:07:00 UTC
[jira] [Created] (HBASE-26866) Shutdown WAL may abort region server
Duo Zhang created HBASE-26866:
---------------------------------
Summary: Shutdown WAL may abort region server
Key: HBASE-26866
URL: https://issues.apache.org/jira/browse/HBASE-26866
Project: HBase
Issue Type: Bug
Reporter: Duo Zhang
https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3140/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestSyncReplicationActive-output.txt
TestSyncReplicationAcive is flaky because of we may abort the region server when shutting down WAL.
{noformat}
2022-03-18T04:50:37,205 WARN [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=36877] master.MasterRpcServices(682): jenkins-hbase13.apache.org,33377,1647579008859 reported a fatal error:
***** ABORTING region server jenkins-hbase13.apache.org,33377,1647579008859: Log rolling failed *****
Cause:
java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$681/1458648270@37209753 rejected from java.util.concurrent.ThreadPoolExecutor@69662eb7[Shutting down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.cleanOldLogs(AbstractFSWAL.java:773)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:935)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$8(AbstractFSWAL.java:953)
at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:196)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:953)
at org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:316)
at org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:214)
{noformat}
The problem here is that, the removal of WAL is async, when shuttting down the WAL, we will close the thread pool so it will throw reject execution exception and cause region server abort.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)