You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2019/01/26 00:45:00 UTC

[jira] [Comment Edited] (HBASE-21787) proc WAL replaces a RIT that holds a lock with a RIT that doesn't

    [ https://issues.apache.org/jira/browse/HBASE-21787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752862#comment-16752862 ] 

Sergey Shelukhin edited comment on HBASE-21787 at 1/26/19 12:44 AM:
--------------------------------------------------------------------

It's a fresh cluster.
Offline regions might be from manual intervention... although I'm not 100% sure, didn't check. Anyway that would explain HBASE-21786.
However the issue here is more general - we load 2 RITs; take lock for the 1st; but then replace it with the 2nd in the region. That doesn't depend on meta state as far as I see.


was (Author: sershe):
It's a fresh cluster.
Offline regions might be from manual intervention... that would explain HBASE-21786.
However the issue here is more general - we load 2 RITs; take lock for the 1st; but then replace it with the 2nd in the region. That doesn't depend on meta state as far as I see.

> proc WAL replaces a RIT that holds a lock with a RIT that doesn't
> -----------------------------------------------------------------
>
>                 Key: HBASE-21787
>                 URL: https://issues.apache.org/jira/browse/HBASE-21787
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Sergey Shelukhin
>            Priority: Critical
>
> This is not the same as HBASE-21786, but related - after master restart, 2 RITs are both in proc WAL. According to the comment where RIT is restored, this is expected.
> However what happens is that master takes lock for the older RIT, and then replaces the older RIT with the newer RIT on the region.
> You can see two "to restore RIT" log lines.
> Both RITs are still active in procedures view (and stuck due to yet another bug that I will file later). However, it seems wrong that lock is held by one RIT but region points to the other RIT as the correct one.
> {noformat}
> 2019-01-25 11:26:54,616 INFO  [master/master:17000:becomeActiveMaster] procedure.MasterProcedureScheduler: Took xlock for pid=1738, ppid=3, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=false; TransitRegionStateProcedure table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN
> 2019-01-25 11:26:54,834 INFO  [master/master:17000:becomeActiveMaster] assignment.AssignmentManager: Attach pid=1738, ppid=3, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=false; TransitRegionStateProcedure table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN to rit=OFFLINE, location=null, table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e to restore RIT
> 2019-01-25 11:26:54,853 INFO  [master/master:17000:becomeActiveMaster] assignment.AssignmentManager: Attach pid=4351, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; TransitRegionStateProcedure table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN to rit=OFFLINE, location=null, table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e to restore RIT
> 2019-01-25 11:27:02,460 INFO  [master/master:17000:becomeActiveMaster] assignment.RegionStateStore: Load hbase:meta entry region=27f7ab2a05d9d730b2ab2339d1531b8e, regionState=OPENING, lastHost=server1,17020,1548290445704, regionLocation=server2,17020,1548442571056, openSeqNum=120108
> 2019-01-25 11:27:10,184 INFO  [PEWorker-11] procedure.MasterProcedureScheduler: Waiting on xlock for pid=4351, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; TransitRegionStateProcedure table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN held by pid=1738
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)