You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Josh Slocum (Jira)" <ji...@apache.org> on 2020/12/06 17:59:00 UTC
[jira] [Resolved] (ZOOKEEPER-3927) ZooKeeper Client Fault Tolerance Extensions

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Slocum resolved ZOOKEEPER-3927.
------------------------------------
    Resolution: Duplicate

> ZooKeeper Client Fault Tolerance Extensions
> -------------------------------------------
>
>                 Key: ZOOKEEPER-3927
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3927
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Josh Slocum
>            Priority: Minor
>
> Tl;dr My team at Indeed has developed ZooKeeper functionality to handle stateful retrying of connectionloss for write operations, and we wanted to reach out to discuss if this is something the ZooKeeper team may be interested in incorporating into the ZooKeeper client or in a separate wrapper.
>  
> Hi ZooKeeper Devs,
> My team uses zookeeper extensively as part of a distributed key-value store we've built at Indeed (think HBase replacement). Due to our deployment setup co-locating our database daemons with our large hadoop cluster, and the network-intensive nature of a lot of our compute jobs, we were experiencing a large amount of transient ConnectionLoss issues. This was especially problematic on important write operations, such as the creation deletion of distributed locks/leases or updating distributed state in the cluster. 
> We saw that some existing zookeeper client wrappers handled retrying in the presence of ConnectionLoss, but all of the ones we looked at ([Curator|https://curator.apache.org/]  [Kazoo|https://github.com/python-zk/kazoo], etc...) retried writes the same as reads - blindly in a loop. This meant that upon retrying a create for example, if the initial create had succeeded on the server but the client got connectionloss, we would get a NodeExists exception on the retried request, even though the znode was created. This resulted in many issues. For the distributed lock/lease example, to other nodes, it looked like the calling node had been successful acquiring the "lock", and to the calling node, it appeared that it was not able to acquire the "lock", which results in a deadlock.
> To solve this, we implemented a set of "connection-loss tolerant primitives" for the main types of write operations. They handle a connection loss by retrying the operation in a loop, but upon error cases in the retry, inspect the current state to see if it matches the case where a previous round that got connectionloss actually succeeded.
> * createRetriable(String path, byte[] data)
> * setDataRetriable(String path, byte[] newData, int currentVersion)
> * deleteRetriable(String path, int currentVersion)
> * compareAndDeleteRetriable(String path, byte[] currentData, int currentVersion)
> For example, in createRetriable, it will retry the create again on connection loss. If the retried call gets a NodeExists exception, it will check to see if (getData(path) == data and dataVersion == 0). If it does, it assumes the first create succeeded and returns success, otherwise it propagates the NodeExists exception.
> These primitives have allowed us to program our ZooKeeper layer as if ConnectionLoss isn't a transient state we have to worry about, since they have essentially the same guarantees as the non-retriable functions in the zookeeper api do (with a slight difference in semantics).
> Because this problem is not solved anywhere else that uses zookeeper (to my knowledge), we think it could be a useful contribution to the ZooKeeper project.
> However, if you are not looking for contributions to extend the zookeeper api, and prefer client extensions to be separate, for example Curator, then we would consider contributing there or open sourcing our implementation as a standalone library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)