You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2022/07/13 18:59:14 UTC

[GitHub] [helix] qqu0127 opened a new pull request, #2176: Wire in the logic for TaskExecutor reset timeout parameter

qqu0127 opened a new pull request, #2176:
URL: https://github.com/apache/helix/pull/2176

   ### Issues
   
   - [X] My PR addresses the following Helix issues and references them in the PR description:
   Fix https://github.com/apache/helix/issues/2175
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI changes:
   The timeout logic introduced in https://github.com/apache/helix/pull/1920, failed to provide a client side entry point. As a followup, this PR address the issue by passing down the parameter from HelixManagerFactorye down to HelixTaskExecutor so that helix users can specify.
   The parameter is encapsulated in HelixManagerProperty to enable client to specify value and pass down from HelixManagerFactory
   
   ### Tests
   
   - [X] The following tests are written for this issue:
   Modified existing test case in [TestParticipantManager.java](https://github.com/apache/helix/compare/master...qqu0127:reset-timeout?expand=1#diff-41a02eec6acda934146e47c98a53813c4af8fd82091db2aadd9f80165af04d17)
   
   - The following is the result of the "mvn test" command on the appropriate module:
   
   (If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)
   
   ### Changes that Break Backward Compatibility (Optional)
   
   - My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:
   
   (Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
   
   ### Documentation (Optional)
   
   - In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 closed pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 closed pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter
URL: https://github.com/apache/helix/pull/2176


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
desaikomal commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921406830


##########
helix-core/src/main/java/org/apache/helix/HelixManagerProperty.java:
##########
@@ -34,6 +35,7 @@ public class HelixManagerProperty {
   private static final Logger LOG = LoggerFactory.getLogger(HelixManagerProperty.class.getName());
   private String _version;
   private long _healthReportLatency;
+  private int _msgHandlerResetTimeout = TaskExecutor.DEFAULT_MSG_HANDLER_RESET_TIMEOUT_MS;

Review Comment:
   I feel this is how Helix and Task framework code is intermingled.  msgHandlerResetTimeout - is it specific to Task? or it is generic msgHandler?
   
   If it is generic, then having default in previous file makes sense. if it is specific to task, then variable name should be changed to indicate it is taskSpecific.
   As far as I know it is generic.



##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**

Review Comment:
   nit:  this comment is same as all other constructors. 
   What is the unique difference of this constructor needs to be documented?
   



##########
helix-core/src/test/java/org/apache/helix/integration/manager/TestParticipantManager.java:
##########
@@ -101,8 +103,13 @@ public void simpleIntegrationTest() throws Exception {
         "MasterSlave", true); // do rebalance
 
     String instanceName = "localhost_12918";
-    HelixManager participant =
-        new ZKHelixManager(_clusterName, instanceName, InstanceType.PARTICIPANT, ZK_ADDR);
+    int resetTimeout = 300;

Review Comment:
   is there any checks wrt what valid resetTimeout value can be? Is there any MaX??



##########
helix-core/src/main/java/org/apache/helix/messaging/DefaultMessagingService.java:
##########
@@ -72,8 +78,8 @@ public DefaultMessagingService(HelixManager manager) {
         new ParticipantStatusMonitor(isParticipant, manager.getInstanceName()),
         new MessageQueueMonitor(manager.getClusterName(), manager.getInstanceName()));
     _asyncCallbackService = new AsyncCallbackService();
-    _taskExecutor.registerMessageHandlerFactory(MessageType.TASK_REPLY.name(),
-        _asyncCallbackService);
+    _taskExecutor.registerMessageHandlerFactory(_asyncCallbackService, TaskExecutor.DEFAULT_PARALLEL_TASKS, resetTimeoutMs);

Review Comment:
   can you please explain this change?  as change description doesn't help in this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921570639


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**
+   * Construct a zk-based cluster manager that enforces all types (PARTICIPANT, CONTROLLER, and
+   * SPECTATOR) to have a name
+   * @param clusterName
+   * @param instanceName
+   * @param type
+   * @param zkAddr
+   * @param stateListener
+   * @param helixManagerProperty
+   * @return a HelixManager backed by Zookeeper
+   */
+  public static HelixManager getZKHelixManager(String clusterName, String instanceName,
+      InstanceType type, String zkAddr, HelixManagerStateListener stateListener, HelixManagerProperty helixManagerProperty) {
+    return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener, helixManagerProperty);

Review Comment:
   While that constructor has been there for a while, the more common pattern for customers is to create a ZkHelixManager through HelixManagerFactory, and in the factory, the existence of zkAddr and HelixManagerProperty is so far mutually exclusive. 
   With this new constructor provided, I predict there will be confusion - it's the first time zkAddr and HelixManagerProperty are both accepted, and we now must make it explicit to the customers that only 1 should be provided. I don't think adding a warning comment will be enough - do you have some ideas about "better logic handling" you mentioned above?
   I believe the best way for this problem is not to enforce the RealmAwareConnectionConfig in HelixManagerProperty, but that's out of the scope of this PR. 
   With what we have now, I will much prefer the reset value to be set in another function, similar to `setLiveInstanceInfoProvider`, so something like `setMessageHandlerResetTimeout`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] mgao0 commented on pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
mgao0 commented on PR #2176:
URL: https://github.com/apache/helix/pull/2176#issuecomment-1184944232

   You'll need to modify more logic to enable the constructor with both zk address and helix manager property. Please take a look at this [PR](https://github.com/apache/helix/pull/2137), it's purpose was to add a constructor like that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921453737


##########
helix-core/src/test/java/org/apache/helix/integration/manager/TestParticipantManager.java:
##########
@@ -101,8 +103,13 @@ public void simpleIntegrationTest() throws Exception {
         "MasterSlave", true); // do rebalance
 
     String instanceName = "localhost_12918";
-    HelixManager participant =
-        new ZKHelixManager(_clusterName, instanceName, InstanceType.PARTICIPANT, ZK_ADDR);
+    int resetTimeout = 300;

Review Comment:
   This is very good question, it's not new logic, and I don't see a maximum value from https://github.com/apache/helix/pull/1920
   I assume the idea is to delegate this to user for a reasonable value, it's hard to set a global max and override. Let me know if you think otherwise. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] mgao0 commented on pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
mgao0 commented on PR #2176:
URL: https://github.com/apache/helix/pull/2176#issuecomment-1185720567

   > > You'll need to modify more logic to enable the constructor with both zk address and helix manager property. Please take a look at this [PR](https://github.com/apache/helix/pull/2137), its purpose was to add a constructor like that.
   > 
   > Thanks @mgao0 for the pointer. Is this the same thing Neal mentioned? I feel if we make sure `_zkConnectionConfig` and `_zkClientConfig` are null in `HelixManagerProperty`, while a valid zkAddr is provided, we should be good to go. Could you please confirm?
   
   I saw your latest commit that added zkAddr to HelixManagerProperty, it makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921443165


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**
+   * Construct a zk-based cluster manager that enforces all types (PARTICIPANT, CONTROLLER, and
+   * SPECTATOR) to have a name
+   * @param clusterName
+   * @param instanceName
+   * @param type
+   * @param zkAddr
+   * @param stateListener
+   * @param helixManagerProperty
+   * @return a HelixManager backed by Zookeeper
+   */
+  public static HelixManager getZKHelixManager(String clusterName, String instanceName,
+      InstanceType type, String zkAddr, HelixManagerStateListener stateListener, HelixManagerProperty helixManagerProperty) {
+    return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener, helixManagerProperty);

Review Comment:
   I have concerns over this constructor, and also storing the reset timeout in HelixManagerProperty in general:
   
   If you check the ZkHelixManager logic, you can see that `zkAddr` and `helixManagerProperty.getZkConnectionConfig()` cannot coexist. This is why HelixManagerProperty is only provided in the constructor below this new one: the `zkAddr` in that constructor is null. Therefore, it concerns me that this constructor can lead to a lot of ZkHelixManager initialization failures because of the above mentioned conflict. I believe most users will provide both a zkAddr and a connection config. 



##########
helix-core/src/main/java/org/apache/helix/HelixManagerProperty.java:
##########
@@ -54,12 +56,13 @@ public HelixManagerProperty(Properties helixManagerProperties, CloudConfig cloud
         helixManagerProperties.getProperty(SystemPropertyKeys.PARTICIPANT_HEALTH_REPORT_LATENCY));
   }
 
-  private HelixManagerProperty(String version, long healthReportLatency,
+  private HelixManagerProperty(String version, long healthReportLatency, int msgHandlerResetTimeout,

Review Comment:
   Same as above: I have concerns over adding this timeout in HelixManagerProperty, due to the connectionConfig it carries. 



##########
helix-core/src/main/java/org/apache/helix/messaging/DefaultMessagingService.java:
##########
@@ -72,8 +78,8 @@ public DefaultMessagingService(HelixManager manager) {
         new ParticipantStatusMonitor(isParticipant, manager.getInstanceName()),
         new MessageQueueMonitor(manager.getClusterName(), manager.getInstanceName()));
     _asyncCallbackService = new AsyncCallbackService();
-    _taskExecutor.registerMessageHandlerFactory(MessageType.TASK_REPLY.name(),
-        _asyncCallbackService);
+    _taskExecutor.registerMessageHandlerFactory(_asyncCallbackService, TaskExecutor.DEFAULT_PARALLEL_TASKS, resetTimeoutMs);

Review Comment:
   I can answer for @qqu0127 : the new signature he used here checks for `_asyncCallbackService.getMessageTypes()` and calls the line 75 `registerMessageHandlerFactory` for each of the types. Since `_asyncCallbackService` only has one message type, which is `TASK_REPLY`, this call is identical with the old call, except that it also adds a resetTimeoutMS. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on PR #2176:
URL: https://github.com/apache/helix/pull/2176#issuecomment-1184956702

   > You'll need to modify more logic to enable the constructor with both zk address and helix manager property. Please take a look at this [PR](https://github.com/apache/helix/pull/2137), its purpose was to add a constructor like that.
   
   Thanks @mgao0 for the pointer. Is this the same thing Neal mentioned? I feel if we make sure `_zkConnectionConfig` and `_zkClientConfig` are null in `HelixManagerProperty`, while a valid zkAddr is provided, we should be good to go. Could you please confirm?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921440205


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**

Review Comment:
   This method takes in all possible params, including both zkAddr and  helixManagerProperty. Let me document more



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921495450


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**
+   * Construct a zk-based cluster manager that enforces all types (PARTICIPANT, CONTROLLER, and
+   * SPECTATOR) to have a name
+   * @param clusterName
+   * @param instanceName
+   * @param type
+   * @param zkAddr
+   * @param stateListener
+   * @param helixManagerProperty
+   * @return a HelixManager backed by Zookeeper
+   */
+  public static HelixManager getZKHelixManager(String clusterName, String instanceName,
+      InstanceType type, String zkAddr, HelixManagerStateListener stateListener, HelixManagerProperty helixManagerProperty) {
+    return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener, helixManagerProperty);

Review Comment:
   Thanks for the review.
   I see the current situation. I'd like to ask whether setting this on ClusterConfig is the right approach. I'm not sure if this should be a cluster level config. The parameter is needed during HelixManager initialization (through DefaultMessagingService), are you suggesting we query clusterConfig at that time?
   Besides, I don't see a strong enough reason prevent us using `HelixManagerProperty`. In fact `helixManagerProperty.getZkConnectionConfig()` is nullable, and we can resolve the conflict with better logic handling. Generally speaking, a POJO property class shouldn't be bound to certain variable. 
   Let me know what you think. Thanks again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921499214


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**
+   * Construct a zk-based cluster manager that enforces all types (PARTICIPANT, CONTROLLER, and
+   * SPECTATOR) to have a name
+   * @param clusterName
+   * @param instanceName
+   * @param type
+   * @param zkAddr
+   * @param stateListener
+   * @param helixManagerProperty
+   * @return a HelixManager backed by Zookeeper
+   */
+  public static HelixManager getZKHelixManager(String clusterName, String instanceName,
+      InstanceType type, String zkAddr, HelixManagerStateListener stateListener, HelixManagerProperty helixManagerProperty) {
+    return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener, helixManagerProperty);

Review Comment:
   To add on, `public ZKHelixManager(String clusterName, String instanceName, InstanceType instanceType,
         String zkAddress, HelixManagerStateListener stateListener,
         HelixManagerProperty helixManagerProperty)` is already a public constructor in `ZkHelixManager` for a while.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921429383


##########
helix-core/src/main/java/org/apache/helix/messaging/DefaultMessagingService.java:
##########
@@ -72,8 +78,8 @@ public DefaultMessagingService(HelixManager manager) {
         new ParticipantStatusMonitor(isParticipant, manager.getInstanceName()),
         new MessageQueueMonitor(manager.getClusterName(), manager.getInstanceName()));
     _asyncCallbackService = new AsyncCallbackService();
-    _taskExecutor.registerMessageHandlerFactory(MessageType.TASK_REPLY.name(),
-        _asyncCallbackService);
+    _taskExecutor.registerMessageHandlerFactory(_asyncCallbackService, TaskExecutor.DEFAULT_PARALLEL_TASKS, resetTimeoutMs);

Review Comment:
   This is essentially the same with method that takes in `resetTimeoutMs`. 
   `_asyncCallbackService` is an instance of `MultiTypeMessageHandlerFactory` so we can use method `  public void registerMessageHandlerFactory(MultiTypeMessageHandlerFactory factory,
         int threadPoolSize, int resetTimeoutMs) `



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921451521


##########
helix-core/src/main/java/org/apache/helix/HelixManagerProperty.java:
##########
@@ -34,6 +35,7 @@ public class HelixManagerProperty {
   private static final Logger LOG = LoggerFactory.getLogger(HelixManagerProperty.class.getName());
   private String _version;
   private long _healthReportLatency;
+  private int _msgHandlerResetTimeout = TaskExecutor.DEFAULT_MSG_HANDLER_RESET_TIMEOUT_MS;

Review Comment:
   Thanks for the review.
   Now it's only specific to task though, that's also why I move it from `HelixTaskExecutor` to `TaskExecutor`. Let me change the variable name here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921594851


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**
+   * Construct a zk-based cluster manager that enforces all types (PARTICIPANT, CONTROLLER, and
+   * SPECTATOR) to have a name
+   * @param clusterName
+   * @param instanceName
+   * @param type
+   * @param zkAddr
+   * @param stateListener
+   * @param helixManagerProperty
+   * @return a HelixManager backed by Zookeeper
+   */
+  public static HelixManager getZKHelixManager(String clusterName, String instanceName,
+      InstanceType type, String zkAddr, HelixManagerStateListener stateListener, HelixManagerProperty helixManagerProperty) {
+    return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener, helixManagerProperty);

Review Comment:
   Sure, to make sure we are on the same page, some quick recap on the current impl:
   1. RealmAwareConnectionConfig in HelixManagerProperty is nullable
   2. In ZKHelixManager, zkAddress and helixManagerProperty. getZkConnectionConfig() cannot coexist, otherwise it's early stopped
   
   In regard to the factory, it's a new method after all, so existing users won't be impacted and new users should be aware. If user happens to provide both zkAddress and that one in property, it won't start at all. I think it's an acceptable behavior.
   Alternatively, we can validate the params in the factory. e.g. Nullify `helixManagerProperty.getZkConnectionConfig` if zkAddress is non-null (or even throw exception). And redirect to the other method if zkAddress is null.
   
   I appreciate the proposal with setter, and I'd like to explore more. With a quick look, something like `setMessageHandlerResetTimeout` might not work since `_taskExecutor.registerMessageHandlerFactory` happens in the constructor, which means we need the value immediately. Making it afterward could be even bigger change. Am I missing anything?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921594851


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**
+   * Construct a zk-based cluster manager that enforces all types (PARTICIPANT, CONTROLLER, and
+   * SPECTATOR) to have a name
+   * @param clusterName
+   * @param instanceName
+   * @param type
+   * @param zkAddr
+   * @param stateListener
+   * @param helixManagerProperty
+   * @return a HelixManager backed by Zookeeper
+   */
+  public static HelixManager getZKHelixManager(String clusterName, String instanceName,
+      InstanceType type, String zkAddr, HelixManagerStateListener stateListener, HelixManagerProperty helixManagerProperty) {
+    return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener, helixManagerProperty);

Review Comment:
   Sure, to make sure we are on the same page, some quick recap on the current impl:
   1. RealmAwareConnectionConfig in HelixManagerProperty is nullable
   2. In ZKHelixManager, zkAddress and helixManagerProperty. getZkConnectionConfig() cannot coexist, otherwise it's early stopped
   
   In regard to the factory, it's a new method after all, so existing users won't be impacted and new users should be aware. If user happens to provide both zkAddress and that one in property, it won't start at all. I think it's an acceptable behavior.
   Alternatively, we can validate the params in the factory. e.g. Nullify `helixManagerProperty.getZkConnectionConfig` if zkAddress is non-null (or even throw exception). And redirect to the other method if zkAddress is null.
   
   I appreciate the proposal with setter, and I'd like to explore more. With a quick look, something like `setMessageHandlerResetTimeout` might not work since `_taskExecutor.registerMessageHandlerFactory` happens in the constructor, which means we need the value immediately. Making it afterward could be even bigger change. Am I missing anything?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] qqu0127 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
qqu0127 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r922324973


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**
+   * Construct a zk-based cluster manager that enforces all types (PARTICIPANT, CONTROLLER, and
+   * SPECTATOR) to have a name
+   * @param clusterName
+   * @param instanceName
+   * @param type
+   * @param zkAddr
+   * @param stateListener
+   * @param helixManagerProperty
+   * @return a HelixManager backed by Zookeeper
+   */
+  public static HelixManager getZKHelixManager(String clusterName, String instanceName,
+      InstanceType type, String zkAddr, HelixManagerStateListener stateListener, HelixManagerProperty helixManagerProperty) {
+    return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener, helixManagerProperty);

Review Comment:
   Agree, implemented this one and updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on a diff in pull request #2176: Wire in the logic for TaskExecutor reset timeout parameter

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on code in PR #2176:
URL: https://github.com/apache/helix/pull/2176#discussion_r921703126


##########
helix-core/src/main/java/org/apache/helix/HelixManagerFactory.java:
##########
@@ -61,6 +61,22 @@ public static HelixManager getZKHelixManager(String clusterName, String instance
     return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener);
   }
 
+  /**
+   * Construct a zk-based cluster manager that enforces all types (PARTICIPANT, CONTROLLER, and
+   * SPECTATOR) to have a name
+   * @param clusterName
+   * @param instanceName
+   * @param type
+   * @param zkAddr
+   * @param stateListener
+   * @param helixManagerProperty
+   * @return a HelixManager backed by Zookeeper
+   */
+  public static HelixManager getZKHelixManager(String clusterName, String instanceName,
+      InstanceType type, String zkAddr, HelixManagerStateListener stateListener, HelixManagerProperty helixManagerProperty) {
+    return new ZKHelixManager(clusterName, instanceName, type, zkAddr, stateListener, helixManagerProperty);

Review Comment:
   The `insert zkAddress into helixManagerProperty` approach is what makes sense to me, but this is again outside the scope. I agree that `setMessageHandlerResetTimeout` will require more work to refactor. 
   It's not perfect but I will leave it to your call. The bare minimum I will ask for is a note in the javadoc to highlight that zkaddress should not coexist with the connection config, in case untested code gets deployed to participants and fail. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org