You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/05/19 20:11:13 UTC

[GitHub] [helix] kaisun2000 opened a new pull request #1019: ConnectionTimeout change to ensure HelixManager initial connection

kaisun2000 opened a new pull request #1019:
URL: https://github.com/apache/helix/pull/1019


   ### Issues
   
   - [x] My PR addresses the following Helix issues and references them in the PR description:
   
   (fix #1018 )
   
   ### Description
   
   - [x] Here are some details about my PR:
   
   ConnectionTimeout change to ensure HelixManager initial connection.
   
   Ensure HelixManager initial session establishment to
   ZooKeeper server would be highly like to succeed given one quorum
   node unresponsive. This is done by make default connection timeout
   value 4 times of default session timeout value.
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   - [ ] The following is the result of the "mvn test" command on the appropriate module:
   
   running, will add later.
   
   ### Commits
   
   - [x] My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation (Optional)
   
   - [ ] In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Code Quality
   
   - [x] My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #1019: ConnectionTimeout change to ensure HelixManager initial connection

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #1019:
URL: https://github.com/apache/helix/pull/1019#discussion_r430047949



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/impl/client/ZkClient.java
##########
@@ -62,7 +62,10 @@
   private static Logger LOG = LoggerFactory.getLogger(ZkClient.class);
 
   public static final int DEFAULT_OPERATION_TIMEOUT = Integer.MAX_VALUE;
-  public static final int DEFAULT_CONNECTION_TIMEOUT = 60 * 1000;
+  // Four times of DEFAULT_SESSION_TIMEOUT would give less 1/125 chance of initial connection
+  // to Zookeeper failure assuming one observer in the 5 observer Zookeeper Quorum is down

Review comment:
       @kaisun2000 
   
   I also think that we should leave the constant as is and make it configurable in the system config.
   
   Note that the HelixManager usage is _**the one you found**_. This is a public constant, and you don't know if it's being used elsewhere. So there is an issue of backward-compatibility if you change the value of the constant.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/impl/client/ZkClient.java
##########
@@ -62,7 +62,10 @@
   private static Logger LOG = LoggerFactory.getLogger(ZkClient.class);
 
   public static final int DEFAULT_OPERATION_TIMEOUT = Integer.MAX_VALUE;
-  public static final int DEFAULT_CONNECTION_TIMEOUT = 60 * 1000;
+  // Four times of DEFAULT_SESSION_TIMEOUT would give less 1/125 chance of initial connection
+  // to Zookeeper failure assuming one observer in the 5 observer Zookeeper Quorum is down
+  // with one DNS name configured as connection string with 3.4.13+ Zookeeper client version
+  public static final int DEFAULT_CONNECTION_TIMEOUT = 120 * 1000;

Review comment:
       +1. This is a public constant, so we need to avoid changing its value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] dasahcc commented on a change in pull request #1019: ConnectionTimeout change to ensure HelixManager initial connection

Posted by GitBox <gi...@apache.org>.
dasahcc commented on a change in pull request #1019:
URL: https://github.com/apache/helix/pull/1019#discussion_r428316817



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/impl/client/ZkClient.java
##########
@@ -62,7 +62,10 @@
   private static Logger LOG = LoggerFactory.getLogger(ZkClient.class);
 
   public static final int DEFAULT_OPERATION_TIMEOUT = Integer.MAX_VALUE;
-  public static final int DEFAULT_CONNECTION_TIMEOUT = 60 * 1000;
+  // Four times of DEFAULT_SESSION_TIMEOUT would give less 1/125 chance of initial connection
+  // to Zookeeper failure assuming one observer in the 5 observer Zookeeper Quorum is down
+  // with one DNS name configured as connection string with 3.4.13+ Zookeeper client version
+  public static final int DEFAULT_CONNECTION_TIMEOUT = 120 * 1000;

Review comment:
       We already have some customers only onboard the zookeeper-api module and not for Helix. My suggestion is not change the default behavior but let it to be config change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on pull request #1019: ConnectionTimeout change to ensure HelixManager initial connection

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on pull request #1019:
URL: https://github.com/apache/helix/pull/1019#issuecomment-638597969


   Close the PR on behave of @kaisun2000 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang closed pull request #1019: ConnectionTimeout change to ensure HelixManager initial connection

Posted by GitBox <gi...@apache.org>.
jiajunwang closed pull request #1019:
URL: https://github.com/apache/helix/pull/1019


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1019: ConnectionTimeout change to ensure HelixManager initial connection

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1019:
URL: https://github.com/apache/helix/pull/1019#discussion_r428236917



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/impl/client/ZkClient.java
##########
@@ -62,7 +62,10 @@
   private static Logger LOG = LoggerFactory.getLogger(ZkClient.class);
 
   public static final int DEFAULT_OPERATION_TIMEOUT = Integer.MAX_VALUE;
-  public static final int DEFAULT_CONNECTION_TIMEOUT = 60 * 1000;
+  // Four times of DEFAULT_SESSION_TIMEOUT would give less 1/125 chance of initial connection
+  // to Zookeeper failure assuming one observer in the 5 observer Zookeeper Quorum is down

Review comment:
       For 3, indeed changing the property is a way. 
   
   The fundamental problem is that if you want to fail fast or not. 
   
   In helix case, only ZkHelixManager uses CONNECTION_TIMEOUT in connect(). If connect() does not succeed, it would throw exceptions and the application won't proceed. 
   
   So it seems there is no downside to do the default change too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1019: ConnectionTimeout change to ensure HelixManager initial connection

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1019:
URL: https://github.com/apache/helix/pull/1019#discussion_r427763659



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/impl/client/ZkClient.java
##########
@@ -62,7 +62,10 @@
   private static Logger LOG = LoggerFactory.getLogger(ZkClient.class);
 
   public static final int DEFAULT_OPERATION_TIMEOUT = Integer.MAX_VALUE;
-  public static final int DEFAULT_CONNECTION_TIMEOUT = 60 * 1000;
+  // Four times of DEFAULT_SESSION_TIMEOUT would give less 1/125 chance of initial connection
+  // to Zookeeper failure assuming one observer in the 5 observer Zookeeper Quorum is down

Review comment:
       1. Please provide the formula instead of one set of input and result in the comment. Not all Helix deployment connects to a ZK which has 5 zk servers.
   2. If we need 4 times of the session timeout, why hardcode here?
   3. Since this configuration depends on the zk server count, I would prefer to specify this in the ZK_CONNECTION_TIMEOUT system property instead of modifying the default connection values. Note that this change might impact all the Helix users that do not specify the ZK_CONNECTION_TIMEOUT, and too long timeout may also cause problems.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1019: ConnectionTimeout change to ensure HelixManager initial connection

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1019:
URL: https://github.com/apache/helix/pull/1019#issuecomment-632353105


   @jiajunwang, @dasahcc, considering that user can configure java property ZK_CONNECTION_TIMEOUT ( "zk.connection.timeout"), then, let us recommend to change this value instead of changing the DEFAULT_CONNECTION_TIMEOUT. 
   
   Put it another way, let us abandon this pull for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org