You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/01/03 18:34:00 UTC

[jira] [Commented] (HBASE-23369) Auto-close 'unknown' Regions reported as OPEN on RegionServers

    [ https://issues.apache.org/jira/browse/HBASE-23369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007687#comment-17007687 ] 

Michael Stack commented on HBASE-23369:
---------------------------------------

I've been running cluster tests with this patch in place the last few weeks. Good for tampering down the mayhem when cluster goes haywire when overdriven by sustained loading causing Master lose accounting.

> Auto-close 'unknown' Regions reported as OPEN on RegionServers
> --------------------------------------------------------------
>
>                 Key: HBASE-23369
>                 URL: https://issues.apache.org/jira/browse/HBASE-23369
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0
>
>
> In old days, if a RegionServer reported a variance that didn't agree w/ Master view of the cluster, we'd kill the RegionServer.
> Lately, in tests that overrun a cluster, after a sustained high-load, Master can start failing its updates against Meta (CallQueueTooBigException <= More on this later). It then can lose proper accounting of all Region members. One variant has a RegionServer reporting its list of open Regions to the Master and the Master doesn't 'know' of a particular Region or the Master may know the Region but expects it open on another RegionServer.
> Here is an example of how it looks each time RS reports:
> {code}
>  2019-12-03 07:07:00,757 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: No t1,08f5c285,1573094375485.ee78a0c951c1c902d8f3f3912394a0e5. RegionStateNode but reported ONLINE at server.example.org,16020,1575354666245 (inServerRegionList=false).
>  2019-12-03 07:07:03,793 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: No t1,08f5c285,1573094375485.ee78a0c951c1c902d8f3f3912394a0e5. RegionStateNode but reported ONLINE at server.example.org,16020,1575354666245 (inServerRegionList=false).
> {code}
> Will also show as an 'inconsistency' in the 'HBCK' tab on the Master UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)