You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Allan Yang (JIRA)" <ji...@apache.org> on 2018/11/29 13:12:00 UTC

[jira] [Commented] (HBASE-21522) meta replicas appear to cause master restart to kill regionservers

    [ https://issues.apache.org/jira/browse/HBASE-21522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703140#comment-16703140 ] 

Allan Yang commented on HBASE-21522:
------------------------------------

Interesting, IIRC, we only need the default replica of meta table to make meta table readable, metaBootstrap.assignMetaReplicas() call in finishActiveMasterInitialization() will handle the assignment replicas of meta table. This should not happen. Could you provide a UT for this case?


> meta replicas appear to cause master restart to kill regionservers
> ------------------------------------------------------------------
>
>                 Key: HBASE-21522
>                 URL: https://issues.apache.org/jira/browse/HBASE-21522
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> On master restart, AM.start adds FIRST_META_REGIONINFO to regionStates; that has replica ID of 0. Before the meta is loaded, AssignmentManager.checkOnlineRegionsReportForMeta is called for RS reports, and that also only checks for 0th replica of meta and loads it once discovered.
> Once the meta is loaded, RS reports are processed normally; however nobody appears to add meta replicas to regionStates.
> So, when an RS hosting one reports in, it gets killed: 
> {noformat}
> ***** ABORTING region server <some server 1>: org.apache.hadoop.hbase.YouAreDeadException: Not online: hbase:meta,,1_0001
> ***** ABORTING region server <some server 2>: org.apache.hadoop.hbase.YouAreDeadException: Not online: hbase:meta,,1_0002
> {noformat}
> This exception is thrown when regionStates has no record for the region.
> RS in question shut down in an orderly manner and they do have the corresponding regions, that master then assigns to someone else in a few minutes.
> Still, this seems less than ideal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)