You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "keith-turner (via GitHub)" <gi...@apache.org> on 2023/08/24 20:42:28 UTC

[GitHub] [accumulo] keith-turner commented on a diff in pull request #3723: increases max wait time for ZK update retry

keith-turner commented on code in PR #3723:
URL: https://github.com/apache/accumulo/pull/3723#discussion_r1304845096


##########
core/src/main/java/org/apache/accumulo/core/fate/zookeeper/ZooReader.java:
##########
@@ -44,7 +43,7 @@ public class ZooReader {
 
   protected static final RetryFactory RETRY_FACTORY =
       Retry.builder().maxRetries(10).retryAfter(250, MILLISECONDS).incrementBy(250, MILLISECONDS)
-          .maxWait(5, SECONDS).backOffFactor(1.5).logInterval(3, MINUTES).createFactory();
+          .maxWait(2, MINUTES).backOffFactor(1.5).logInterval(3, MINUTES).createFactory();

Review Comment:
   Yeah this is the max wait time that it will grow to.  Starts off sleeping 250 ms and then grows from there until getting to 2 minutes if it keeps failing.
   
   One risk is  that single/few process that are failing when no other process is failing will not be as responsive as before.  
   
   The reason I am thinking of increasing the max is for the thundering heard case.  Like if there are 4K tservers all trying to mutate the same ZK node, backing off to 2 min would have on average 33 tservers per second trying to mutate when reaching max wait time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org