You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chenglei (Jira)" <ji...@apache.org> on 2022/07/24 07:10:00 UTC

[jira] [Commented] (HBASE-27230) RegionServer should be aborted when WAL.sync throws TimeoutIOException

    [ https://issues.apache.org/jira/browse/HBASE-27230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570421#comment-17570421 ] 

chenglei commented on HBASE-27230:
----------------------------------

Pushed to master, thanks [~zhangduo] for help and review.

> RegionServer should be aborted when WAL.sync throws TimeoutIOException
> ----------------------------------------------------------------------
>
>                 Key: HBASE-27230
>                 URL: https://issues.apache.org/jira/browse/HBASE-27230
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 3.0.0-alpha-4
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>
> As HBASE-27223 said, if  {{WAL.sync}} get a timeout exception, we should abort the region server, as the design of WAL sync, is to succeed or die, there is no 'failure'. It is usually not a big deal is because we set a very large default value(5 minutes) for {{AbstractFSWAL.WAL_SYNC_TIMEOUT_MS}}, usually the WAL system will abort the region server if it can not finish the sync within 5 minutes.
> In the PR, only the {{WAL.sync}}  timeout in {{HRegion#doWALAppend}} ,regionServer is always aborted. For {{WALUtil.writeMarker}}, it is just record the internal state and  seems it is no need to always abort the regionServer when {{WAL.sync}} timeout,it is the internal state transition that determines whether regionServer is aborted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)