You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/02/18 14:35:00 UTC

[jira] [Work logged] (CURATOR-559) Inconsistent ZK timeouts

     [ https://issues.apache.org/jira/browse/CURATOR-559?focusedWorklogId=388838&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388838 ]

ASF GitHub Bot logged work on CURATOR-559:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Feb/20 14:34
            Start Date: 18/Feb/20 14:34
    Worklog Time Spent: 10m 
      Work Description: Randgalt commented on pull request #346: [CURATOR-559] Fix nested retry loops
URL: https://github.com/apache/curator/pull/346
 
 
   The retry loop mechanism ended up getting nested multiple times causing exponential calls to the retry policy and violating a given policy's limits. Use a thread local to mitigate this so that a retry loop is reused for nested API calls, etc.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 388838)
    Remaining Estimate: 0h
            Time Spent: 10m

> Inconsistent ZK timeouts
> ------------------------
>
>                 Key: CURATOR-559
>                 URL: https://issues.apache.org/jira/browse/CURATOR-559
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 4.2.0
>            Reporter: Grant Digby
>            Assignee: Jordan Zimmerman
>            Priority: Blocker
>             Fix For: 4.3.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've configured a reasonable timeout using BoundedExponentialBackoffRetry, and generally it works as I'd expect if ZK is down when I make a call like "create.forPath". But if ZK is unavailable when I call acquire on an InterProcessReadWriteLock, it takes far longer before it finally times out.
> I call acquire which is wrapped in "RetryLoop.callWithRetry" and it goes onto call findProtectedNodeInForeground which is also wrapped in "RetryLoop.callWithRetry". If I've configured the BoundedExponentialBackoffRetry to retry 20 times, the inner retry tries 20 times for every one of the 20 outer retry loops, so it retries 400 times.
>  
> This class recreates it, if you put break points at the commented sections and bring ZK down you can see the different times until it disconnects and the stack traces which I've included below.
>  
> {code:java}
> public class GoCurator {
> public static void main(String[] args) throws Exception {
>     CuratorFramework cf = CuratorFrameworkFactory.newClient(
>             "localhost:2181",
>             new BoundedExponentialBackoffRetry(200, 10000, 20)
>     );
>     cf.start();
>     String root = "/myRoot";
>     if(cf.checkExists().forPath(root) == null) {
>         // Stacktrace A showing what happens if ZK is down for this call
>         cf.create().forPath(root);
>     }
>     InterProcessReadWriteLock lcok = new InterProcessReadWriteLock(cf, "/grant/myLock");
>     // See stacktrace B showing the nested re-try if ZK is down for this call
>     lcok.readLock().acquire();
>     lcok.readLock().release();
>     System.out.println("done");
> } {code}
>  
> Stacktrace A (if ZK is down when I'm calling create().forPath). This shows the single retry loop so it exist after the correct number of attempts:
>  
> {code:java}
>  java.lang.Thread.State: WAITING
>   at java.lang.Object.wait(Object.java:-1)
>   at java.lang.Object.wait(Object.java:502)
>   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1499)
>   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1487)
>   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2617)
>   at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
>   at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
>   at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
>   at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
>   at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
>   at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
>   at com.gebatech.curator.GoCurator.main(GoCurator.java:25) {code}
> Stacktrace B (if ZK is down when I call InterProcessReadWriteLock#readLock#acquire). This shows the nested re-try loop so it doesn't exit until 20*20 attempts.
>  
> {code:java}
>  java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Unsafe.java:-1)
>   at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>   at org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:434)
>   at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:56)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
>   at org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:1239)
>   at org.apache.curator.framework.imps.CreateBuilderImpl.access$1700(CreateBuilderImpl.java:51)
>   at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1167)
>   at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
>   at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
>   at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
>   at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
>   at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
>   at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
>   at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575)
>   at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
>   at org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
>   at org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
>   at org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
>   at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
>   at com.gebatech.curator.GoCurator.main(GoCurator.java:29) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)