You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2021/01/04 18:50:00 UTC

[jira] [Created] (HBASE-25460) Expose drainingServers as cluster metric

Viraj Jasani created HBASE-25460:
------------------------------------

             Summary: Expose drainingServers as cluster metric
                 Key: HBASE-25460
                 URL: https://issues.apache.org/jira/browse/HBASE-25460
             Project: HBase
          Issue Type: New Feature
            Reporter: Viraj Jasani


Due to some reason, we had significantly high number of servers put in decommissioned mode and for significant time, they continued being in the same state serving no regions at all. This put heavy load on rest of live servers and it was too late before one could recognize the issues with improper balancing of the cluster. The cluster was imbalanced to the point where SLB was not balancing the cluster until one turns on *_hbase.master.balancer.stochastic.runMaxSteps_* because calculated steps were too high. And as expected, such balancing brings up sudden spike of RITs immediately.

Although running into such situation is rare, we can take some precautions by exposing metric. We should expose list of draining RegionServers as jmx metrics just like we expose _*liveRegionServers*_ and _*deadRegionServers*_. Such metric can help configure alerts with threshold on % of total RS that are allowed to go in draining mode (e.g during rolling upgrades) in any circumstances.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)