You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2018/12/17 22:32:00 UTC

[jira] [Created] (HBASE-21611) REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact better with crash procedure.

Sergey Shelukhin created HBASE-21611:
----------------------------------------

             Summary: REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact better with crash procedure.
                 Key: HBASE-21611
                 URL: https://issues.apache.org/jira/browse/HBASE-21611
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


1) Not a bug per se, since HDFS is not supposed to lose files, just a bit fragile.
When a dead server's WAL directory is deleted (due to a manual intervention, or some issue with HDFS) while some regions are in CLOSING state on that server, they get stuck forever in REGION_STATE_TRANSITION_CONFIRM_CLOSED - REGION_STATE_TRANSITION_CLOSE - "give up and mark the procedure as complete, the parent procedure will take care of this" loop. There's no crash procedure for the server so nobody ever takes care of that.

2) Under normal circumstances, when a large WAL is being split, this same loop keeps spamming the logs and wasting resources for no reason, until the crash procedure completes. There's no reason for it to retry - it should just wait for crash procedure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)