You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2021/12/17 18:10:00 UTC

[jira] [Commented] (HBASE-26298) Downgrading is complicated by refusal to assign system tables to lower version

    [ https://issues.apache.org/jira/browse/HBASE-26298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461600#comment-17461600 ] 

Bryan Beaudreault commented on HBASE-26298:
-------------------------------------------

 Hey [~vjasani], I'm taking a look at this again. What do you think of this rather small and pragmatic change for now:
 * Make AssignmentManager implement ConfigurationObserver, so that we can live update "hbase.min.version.move.system.tables"
 * Improve docs a bit (i will take a stab, and see if you agree with the new description)

The other thing I was wondering about is whether we could set a better default value for this. I am guessing the devs are the most knowledgeable about what incompatibilities exist that would warrant not moving system tables, right? It seems harsh to make an operator figure this out.

I'm going to get to work on the 2 bullets, the last piece is mostly for discussion – curious your thoughts.

> Downgrading is complicated by refusal to assign system tables to lower version
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-26298
>                 URL: https://issues.apache.org/jira/browse/HBASE-26298
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Bryan Beaudreault
>            Priority: Minor
>
> I was doing some rolling downgrades of test clusters and keep getting into a state where my automation gets stuck trying to drain the final RegionServer in the cluster. At this point that RegionServer hosts 3 regions: meta, quota, namespace. The HMaster is outputting logs like: "Passed destination servername is null/empty so choosing a server at random".
> I's very hard to understand what's happening based on that log, so you really have to look at the code. Tracking down that log line, it becomes somewhat clear that you are getting trapped by AssignmentManager.getExcludedServersForSystemTable().
> Looking at the code, you can see comments related to "hbase.min.version.move.system.tables" config, but the comments are very unclear. What should I set this to?
> This setting was added in https://issues.apache.org/jira/browse/HBASE-22923 which focuses mostly on RSGroup, but this issue is affecting clusters that do not use RSGroup. The release note also is not super clear.
> It would be great to clarify the docs to help the operator know what to change this to, or perhaps make the config itself more intuitive. For example, could we just make it an allowlist of versions that can hold system tables? At that point my path is clear: add the version I'm downgrading to to the allowlist.
> This issue is also exacerbated by the fact that by the time you've realized this you're in a somewhat tricky situation where there's only 1 RegionServer left and your only way around it is to force stop it or to push a new config and rolling restart your HMasters. It would be great if this setting were able to be updated via Admin or at the very least reloadable with ConfigurationObserver.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)