You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2018/09/07 18:43:00 UTC

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

    [ https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607502#comment-16607502 ] 

stack commented on HBASE-19121:
-------------------------------

h2. Horror Story

Big cluster. Lots of regions. A couple of STUCK procedures that prevent clean-up of old WALs.   A backlog builds. Master crashes (for some unrelated reason).  New Master tries to become active Master. It reads outstanding MasterProcWAL logs to reconstruct assignment. If a large backlog, this can take hours.

HBASE-21165 describes an instance where 700servers and 420k regions. The Master is taking hours to put together assignment again from backed-up logs (~300 and I think a few million procedures). HBASE-21165 is adding emitting state because otherwise it looks like we are  hung.

Need to support remove of all MasterProcWAL and come up anyways as per notes above.

> HBCK for AMv2 (A.K.A HBCK2)
> ---------------------------
>
>                 Key: HBASE-19121
>                 URL: https://issues.apache.org/jira/browse/HBASE-19121
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck
>            Reporter: stack
>            Assignee: Umesh Agashe
>            Priority: Major
>         Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)