You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2022/03/23 07:01:00 UTC

[jira] [Resolved] (HBASE-26866) Shutdown WAL may abort region server

     [ https://issues.apache.org/jira/browse/HBASE-26866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Duo Zhang resolved HBASE-26866.
-------------------------------
    Fix Version/s: 3.0.0-alpha-3
     Hadoop Flags: Reviewed
       Resolution: Fixed

Merged to master.

Thanks [~Xiaolin Ha] for reviewing.

> Shutdown WAL may abort region server
> ------------------------------------
>
>                 Key: HBASE-26866
>                 URL: https://issues.apache.org/jira/browse/HBASE-26866
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 3.0.0-alpha-3
>
>
> https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3140/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestSyncReplicationActive-output.txt
> TestSyncReplicationAcive is flaky because of we may abort the region server when shutting down WAL.
> {noformat}
> 2022-03-18T04:50:37,205 WARN  [RpcServer.default.FPBQ.Fifo.handler=2,queue=0,port=36877] master.MasterRpcServices(682): jenkins-hbase13.apache.org,33377,1647579008859 reported a fatal error:
> ***** ABORTING region server jenkins-hbase13.apache.org,33377,1647579008859: Log rolling failed *****
> Cause:
> java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$$Lambda$681/1458648270@37209753 rejected from java.util.concurrent.ThreadPoolExecutor@69662eb7[Shutting down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0]
> 	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> 	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> 	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> 	at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
> 	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.cleanOldLogs(AbstractFSWAL.java:773)
> 	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriterInternal(AbstractFSWAL.java:935)
> 	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.lambda$rollWriter$8(AbstractFSWAL.java:953)
> 	at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:196)
> 	at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:953)
> 	at org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:316)
> 	at org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:214)
> {noformat}
> The problem here is that, the removal of WAL is async, when shuttting down the WAL, we will close the thread pool so it will throw reject execution exception and cause region server abort.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)