You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ranpanfeng (Jira)" <ji...@apache.org> on 2019/08/26 00:59:00 UTC
[jira] [Comment Edited] (HBASE-22918) RegionServer violates failfast fault assumption

    [ https://issues.apache.org/jira/browse/HBASE-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16915396#comment-16915396 ] 

ranpanfeng edited comment on HBASE-22918 at 8/26/19 12:58 AM:
--------------------------------------------------------------

hi, [~busbey]

What you say is just what I observed,   Yes, recoveryLease  invocation is a fence point, only one DFSClient can hold the lease, so there is only a single writer append&flush HLOG. so there is no problem with write operations, however, linearization  consistency can not be guaranteed on a single row. the event history as follows.

t0: rs#0 owns region#0, a hbase client A hold a long-term connecton to rs#0.

t1: NP fault between rs#0 and zk has happaned. 

t2: emphemeral node of rs#0 is removed after zk fails to receive HB from rs#0 and zk session timeout.

t3: wacher on active master notified and then master recover region#0 to rs#1.

t4: someone mutate row#0 of region#0 which resides on rs#1.

t5: hbase client A read the stale version of row#0 from rs#0 via the long-term connection.

t6: rs#0 timeouts and encounters YouAreDeadException, and then suicides.

t7: rs#0 shutdownss.

 

in the t5 time point, is there a stale reading happen?


was (Author: satanson):
What you say is just what I observed,   Yes, recoveryLease  invocation is a fence point, only one DFSClient can hold the lease, so there is only a single writer append&flush HLOG. so there is no problem with write operations, however, linearization  consistency can not be guaranteed on a single row. the event history as follows.

t0: rs#0 owns region#0, a hbase client A hold a long-term connecton to rs#0.

t1: NP fault between rs#0 and zk has happaned. 

t2: emphemeral node of rs#0 is removed after zk fails to receive HB from rs#0 and zk session timeout.

t3: wacher on active master notified and then master recover region#0 to rs#1.

t4: someone mutate row#0 of region#0 which resides on rs#1.

t5: hbase client A read the stale version of row#0 from rs#0 via the long-term connection.

t6: rs#0 timeouts and encounters YouAreDeadException, and then suicides.

t7: rs#0 shutdownss.

 

in the t5 time point, is there a stale reading happen?

> RegionServer violates failfast fault assumption
> -----------------------------------------------
>
>                 Key: HBASE-22918
>                 URL: https://issues.apache.org/jira/browse/HBASE-22918
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ranpanfeng
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> hbase 2.1.5 is tested and veriflied seriously before it will be deployed in our production environment. we give NP(network partition) fault a very important care. so NP fault injection tests are conducted in our test environment. Some findings are exposed.
> I use ycsb to write data  into table SYSTEM:test, which resides on regionserver0; during the writting, I use iptables to drop any packet from regionserver0 to zookeeper quorums. after
> a default zookeeper.session.out(90'), regionserver0 throws YouAreDeadException after retries  to connect to zookeeper on TimeoutException error. then, regionserver0 suicides itself, before regionserver0 invokes completeFile  on WAL, the active master already considered regionserver0 has dead pre-maturely, so invokes recoverLease to close the WAL on regionserver0 forcely.
> In trusted idc, distributed storage assumes that the error are always failstop/failfast faults, there are no Byzantine failures. so in above scenario, active master should take over the WAL on regionserver0 after regionserver0 is suicided successfully.  According to lease protocol, RS
> should suicide in a lease period, and active master should take over the WAL
>  after a grace period has elapsed, and invariant "lease period < grace period" should always hold.  in hbase-site.xml, only on config property "zookeeper.session.timeout" is given,  I think we should provide two properties:
>   1. regionserver.zookeeper.session.timeout
>   2. master.zookeeper.session.timeout
> HBase admin then can tune regionserver.zookeeper.session.timeout less than master.zookeeper.session.timeout. In this way, failstop assumption is guaranteed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)