You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2022/06/12 19:14:00 UTC

[jira] [Resolved] (HBASE-23206) ZK quorum redundancy with failover in RZK

     [ https://issues.apache.org/jira/browse/HBASE-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Kyle Purtell resolved HBASE-23206.
-----------------------------------------
    Resolution: Won't Fix

> ZK quorum redundancy with failover in RZK
> -----------------------------------------
>
>                 Key: HBASE-23206
>                 URL: https://issues.apache.org/jira/browse/HBASE-23206
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> We have faced a few production issues where the reliability of the ZooKeeper quorum serving the cluster has not been as robust as expected. The most recent one was essentially ZOOKEEPER-2164 (and related: ZOOKEEPER-900). These can be mitigated by a ZK server configuration change but the incidents suggest it may be worth thinking about how to be less reliant on the service provided by a single ZK quorum instance. 
> A solution would be holistic with several parts:
> - HBASE-18095 to get ZK dependencies out of the client
> - Related HBase replication improvements to track peer and position state in HBase tables instead of znodes
> - This brainstorming...
> For this issue, RecoverableZooKeeper (RZK) might be taught how to speak to two separate ZK quorum redundantly, so ZK client operations via RZK succeed even if one of them is temporarily unable to provide service. The loss of one of a pair (or more) of redundant quorums would no longer impact availability of the HBase service. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)