You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2019/02/11 19:28:00 UTC

[jira] [Comment Edited] (HBASE-21864) add region state version and reinstate YouAreDead exception in region report

    [ https://issues.apache.org/jira/browse/HBASE-21864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765323#comment-16765323 ] 

Sergey Shelukhin edited comment on HBASE-21864 at 2/11/19 7:27 PM:
-------------------------------------------------------------------

[~stack] it's just the regular heartbeat. 
When RS reported incorrect state, master used to kill it (YouAreDeadException), but that was removed because of these races.

I was thinking storing a version per region (not sure yet if it can be in memory only, or if we'd have to store in meta too). It would be incremented by master on every change. It would just store the last version RS acked  for this region, and discard all messages before that.
One additional possible benefit is for the current crop of races with double assignment. If RS reports something like "I opened this region you never expected me to open", it would be easier to look and see that it's acting on a stale message and doesn't know the current state, and kill it conditionally to avoid data loss.


was (Author: sershe):
[~stack] it's just the regular heartbeat. 
When RS reported incorrect state, master used to kill it (YouAreDeadException), but that was removed because of these races.

I was thinking storing a version per region (not sure yet if it can be in memory only, or if we'd have to store in meta too). It would be incremented by master on every change. It would just store the last version RS acked  for this region, and discard all messages before that.
One additional possible benefit is for the current crop of races with double assignment. If RS reports something like "I opened this region you never expected me to open", it would be easier to look and see that it's acting on a stale message and kill it conditionally to avoid data loss.

> add region state version and reinstate YouAreDead exception in region report
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-21864
>                 URL: https://issues.apache.org/jira/browse/HBASE-21864
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> The state version will ensure we don't have network-related races  (e.g. the one I reported in some other bug -
> {code}
> RS: send report {R1} ...
> M: close R1
> RS: I closed R1
> M ... receive report {R1}
> M: you shouldn't have R1, die
> {code}).
> Then we can revert the change that removed YouAreDead exception... RS in incorrect state should be either brought into correct state or killed because it means there's some bug; right now if double assignment happens (I found 2 different cases just this week ;)) master lets RS with incorrect assignment keep it forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)