You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2019/10/23 17:36:00 UTC

[jira] [Created] (HBASE-23206) ZK quorum redundancy with failover in RZK

Andrew Kyle Purtell created HBASE-23206:
-------------------------------------------

             Summary: ZK quorum redundancy with failover in RZK
                 Key: HBASE-23206
                 URL: https://issues.apache.org/jira/browse/HBASE-23206
             Project: HBase
          Issue Type: Brainstorming
            Reporter: Andrew Kyle Purtell


We have faced a few production issues where the reliability of the ZooKeeper quorum serving the cluster has not been as robust as expected. The most recent one was essentially ZOOKEEPER-2164 (and related: ZOOKEEPER-900). These can be mitigated by a ZK server configuration change but the incidents suggest it may be worth thinking about how to be less reliant on the service provided by a single ZK quorum instance. 

A solution would be holistic with several parts:
- HBASE-18095 to get ZK dependencies out of the client
- Related HBase replication improvements to track peer and position state in HBase tables instead of znodes
- This brainstorming...

For this part, we could consider the possibility that RecoverableZooKeeper (RZK) might be taught how to speak to two separate ZK quorum redundantly, and continue to offer service even if one of them is temporarily unable to provide service. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)