You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/11/03 14:10:44 UTC

[GitHub] [helix] narendly commented on a change in pull request #1494: Add admin and Rest API to purge instances that have been offline more than specified period of time

narendly commented on a change in pull request #1494:
URL: https://github.com/apache/helix/pull/1494#discussion_r515746448



##########
File path: helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
##########
@@ -250,6 +251,13 @@ public void dropInstance(String clusterName, InstanceConfig instanceConfig) {
     }
   }
 
+  @Override
+  public void purgeOfflineInstances(String clusterName, Long timeout) {
+    Map<String, InstanceConfig> timeoutOfflineInstances = findTimeoutOfflineInstances(clusterName
+        , timeout);
+    timeoutOfflineInstances.values().forEach(instance -> dropInstance(clusterName, instance));

Review comment:
       @zhangmeng916 let's read that one a little more carefully. 
   
   > as discussed, i think what we ultimately want to achieve is to provide a service or a sidecar controller to do this automatically
   
   this doesn't mean that we will include the purge logic into the main controller. it clearly states that it will be a service or a sidecar controller. and i believe we all agreed that this would be the direction.
   
   the reason we are shifting the responsibility to the user is for the ease of implementation. ultimately, we want to automate this in our holistic cluster management environment (HaaS).
   
   > adding one more read/write per instance in a short time period (e.g. 10 or 20 seconds) for lock makes it worse without adding extra benefit for them.
   
   I'm not sure if i completely agree with the statement. 10-20 sec is a very generous duration where zk server will be handle many read/writes. also I don't agree that there's no extra benefit. the benefit we're talking about here is concurrency control in a distributed setting? i'm not suggesting we add read/write operations for nothing. 
   
   if you prefer to leave out locking for this API (which is totally fine with the right assumptions), i suggest we give some context in the JavaDoc and clearly state that this operation is not thread-safe and that it _should_ lock in the future when called automatically.

##########
File path: helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
##########
@@ -250,6 +251,13 @@ public void dropInstance(String clusterName, InstanceConfig instanceConfig) {
     }
   }
 
+  @Override
+  public void purgeOfflineInstances(String clusterName, Long timeout) {
+    Map<String, InstanceConfig> timeoutOfflineInstances = findTimeoutOfflineInstances(clusterName
+        , timeout);
+    timeoutOfflineInstances.values().forEach(instance -> dropInstance(clusterName, instance));

Review comment:
       @zhangmeng916 let's read that one a little more carefully. 
   
   > as discussed, i think what we ultimately want to achieve is to provide a service or a sidecar controller to do this automatically
   
   this doesn't mean that we will include the purge logic into the main controller. it clearly states that it will be a service or a sidecar controller. and i believe we all agreed that this would be the direction.
   
   the reason we are shifting the responsibility to the user is for the ease of implementation. ultimately, we want to automate this in our holistic cluster management environment (HaaS).
   
   > adding one more read/write per instance in a short time period (e.g. 10 or 20 seconds) for lock makes it worse without adding extra benefit for them.
   
   I'm not sure if i completely agree with the statement. 10-20 sec is a very generous duration where zk server will be handle many read/writes. also I don't agree that there's no extra benefit. the benefit we're talking about here is concurrency control in a distributed setting? it's not like we'd be locking for nothing. 
   
   if you prefer to leave out locking for this API (which is totally fine with the right assumptions), i suggest we give some context in the JavaDoc and clearly state that this operation is not thread-safe and that it _should_ lock in the future when called automatically.

##########
File path: helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
##########
@@ -250,6 +251,13 @@ public void dropInstance(String clusterName, InstanceConfig instanceConfig) {
     }
   }
 
+  @Override
+  public void purgeOfflineInstances(String clusterName, Long timeout) {
+    Map<String, InstanceConfig> timeoutOfflineInstances = findTimeoutOfflineInstances(clusterName
+        , timeout);
+    timeoutOfflineInstances.values().forEach(instance -> dropInstance(clusterName, instance));

Review comment:
       @zhangmeng916 let's read that one a little more carefully. 
   
   > as discussed, i think what we ultimately want to achieve is to provide a service or a sidecar controller to do this automatically
   
   this doesn't mean that we will include the purge logic into the main controller. it clearly states that it will be a _service_ or a _sidecar_ controller. and i believe we all agreed that this would be the direction.
   
   the reason we are shifting the responsibility to the user is for the ease of implementation. ultimately, we want to automate this in our holistic cluster management environment (HaaS).
   
   > adding one more read/write per instance in a short time period (e.g. 10 or 20 seconds) for lock makes it worse without adding extra benefit for them.
   
   I'm not sure if i completely agree with the statement. 10-20 sec is a very generous duration where zk server will be handle many read/writes. also I don't agree that there's no extra benefit. the benefit we're talking about here is concurrency control in a distributed setting? it's not like we'd be locking for nothing. 
   
   if you prefer to leave out locking for this API (which is totally fine with the right assumptions), i suggest we give some context in the JavaDoc and clearly state that this operation is not thread-safe and that it _should_ lock in the future when called automatically.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org