You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Allan Yang (JIRA)" <ji...@apache.org> on 2018/11/12 03:04:00 UTC

[jira] [Commented] (HBASE-20671) Merged region brought back to life causing RS to be killed by Master

    [ https://issues.apache.org/jira/browse/HBASE-20671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683151#comment-16683151 ] 

Allan Yang commented on HBASE-20671:
------------------------------------

[~elserj], FYI, HBASE-21395 is a work around for this issue in branch-2.0 and branch-2.1, in branch-2+, RTSP should reslove this kind of problem.

> Merged region brought back to life causing RS to be killed by Master
> --------------------------------------------------------------------
>
>                 Key: HBASE-20671
>                 URL: https://issues.apache.org/jira/browse/HBASE-20671
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>    Affects Versions: 2.0.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>         Attachments: 0001-Test-for-HBASE-20671.patch, hbase-hbase-master-ctr-e138-1518143905142-336066-01-000003.hwx.site.log.zip, hbase-hbase-regionserver-ctr-e138-1518143905142-336066-01-000002.hwx.site.log.zip, workaround.txt
>
>
> Another bug coming out of a master restart and replay of the pv2 logs.
> The master merged two regions into one successfully, was restarted, but then ended up assigning the children region back out to the cluster. There is a log message which appears to indicate that RegionStates acknowledges that it doesn't know what this region is as it's replaying the pv2 WAL; however, it incorrectly assumes that the region is just OFFLINE and needs to be assigned.
> {noformat}
> 2018-05-30 04:26:00,055 INFO  [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] master.HMaster: Client=hrt_qa//172.27.85.11 Merge regions a7dd6606dcacc9daf085fc9fa2aecc0c and 4017a3c778551d4d258c785d455f9c0b
> 2018-05-30 04:28:27,525 DEBUG [master/ctr-e138-1518143905142-336066-01-000003:20000] procedure2.ProcedureExecutor: Completed pid=4368, state=SUCCESS; MergeTableRegionsProcedure table=tabletwo_merge, regions=[a7dd6606dcacc9daf085fc9fa2aecc0c, 4017a3c778551d4d258c785d455f9c0b], forcibly=false
> {noformat}
> {noformat}
> 2018-05-30 04:29:20,263 INFO  [master/ctr-e138-1518143905142-336066-01-000003:20000] assignment.AssignmentManager: a7dd6606dcacc9daf085fc9fa2aecc0c regionState=null; presuming OFFLINE
> 2018-05-30 04:29:20,263 INFO  [master/ctr-e138-1518143905142-336066-01-000003:20000] assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! rit=OFFLINE, location=null, table=tabletwo_merge, region=a7dd6606dcacc9daf085fc9fa2aecc0c
> 2018-05-30 04:29:20,266 INFO  [master/ctr-e138-1518143905142-336066-01-000003:20000] assignment.AssignmentManager: 4017a3c778551d4d258c785d455f9c0b regionState=null; presuming OFFLINE
> 2018-05-30 04:29:20,266 INFO  [master/ctr-e138-1518143905142-336066-01-000003:20000] assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!! rit=OFFLINE, location=null, table=tabletwo_merge, region=4017a3c778551d4d258c785d455f9c0b
> {noformat}
> Eventually, the RS reports in its online regions, and the master tells it to kill itself:
> {noformat}
> 2018-05-30 04:29:24,272 WARN  [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=20000] assignment.AssignmentManager: Killing ctr-e138-1518143905142-336066-01-000002.hwx.site,16020,1527654546619: Not online: tabletwo_merge,,1527652130538.a7dd6606dcacc9daf085fc9fa2aecc0c.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)