You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/08/20 01:18:23 UTC

[GitHub] [helix] kaisun2000 opened a new pull request #1295: fix TestRawZkClient unstableness #1294

kaisun2000 opened a new pull request #1295:
URL: https://github.com/apache/helix/pull/1295


   
   
   ### Issues
   
   - [ ] My PR addresses the following Helix issues and references them in the PR description:
   
   Fix #1294 
   
   ZkClient connect to ZooKeeper client first before monitor bean
   initialization. If the monitor is not fully constructed, the
   test would pass. Otherwise, as in github, it would not. Fix this
   issue by properly construct the object.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI changes:
   
   (Write a concise description including what, why, how)
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   (List the names of added unit/integration tests)
   
   - [x] The following is the result of the "mvn test" command on the appropriate module:
   
   In Old connection New session
   After session expiry sessionId= 72277739317428225
   Shut down zookeeper at port 2127 in thread main
   Starting ZK server at localhost:2358
   [INFO] Tests run: 44, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 701.476 s - in TestSuite
   [INFO] 
   [INFO] Results:
   [INFO] 
   [INFO] Tests run: 44, Failures: 0, Errors: 0, Skipped: 0
   [INFO] 
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  11:47 min
   [INFO] Finished at: 2020-08-19T18:11:02-07:00
   [INFO] ------------------------------------------------------------------------
   ksun-mn1:zookeeper-api ksun$ git status 
   
   ### Documentation (Optional)
   
   - In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490644018



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -31,9 +31,11 @@
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.CopyOnWriteArraySet;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicLong;
 import javax.management.JMException;
 
+import com.google.common.annotations.VisibleForTesting;

Review comment:
       fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492432202



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -38,9 +38,13 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 import org.apache.helix.monitoring.mbeans.exception.MetricException;
 import org.apache.helix.zookeeper.zkclient.ZkEventThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 public class ZkClientMonitor extends DynamicMBeanProvider {
+  private static final Logger LOG = LoggerFactory.getLogger(ZkClientMonitor.class);

Review comment:
       fixed.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -32,6 +32,9 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.HistogramDynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+import org.apache.helix.zookeeper.zkclient.ZkClient;

Review comment:
       removed.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -109,12 +135,8 @@ public DynamicMBeanProvider register() throws JMException {
     doRegister(attributeList, MBEAN_DESCRIPTION,
         getObjectName(_monitorType, _monitorKey, _monitorInstanceName));
     for (ZkClientPathMonitor.PredefinedPath path : ZkClientPathMonitor.PredefinedPath.values()) {
-      // If monitor root path only, check if the current path is Root.
-      // Otherwise, add monitors for every path.
-      if (!_monitorRootOnly || path.equals(ZkClientPathMonitor.PredefinedPath.Root)) {
-        _zkClientPathMonitorMap.put(path,
-            new ZkClientPathMonitor(path, _monitorType, _monitorKey, _monitorInstanceName)
-                .register());
+      if (_zkClientPathMonitorMap.get(path) != null)  {
+        _zkClientPathMonitorMap.get(path).register();

Review comment:
       There are many ways to write a loop and now we have stream. 
   
   Still, I will just change it anyway.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -34,6 +34,7 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 
 
+

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492425479



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -38,9 +38,13 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 import org.apache.helix.monitoring.mbeans.exception.MetricException;
 import org.apache.helix.zookeeper.zkclient.ZkEventThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 public class ZkClientMonitor extends DynamicMBeanProvider {
+  private static final Logger LOG = LoggerFactory.getLogger(ZkClientMonitor.class);

Review comment:
       _logger exists in the super class. There is no need to add a new one here.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -32,6 +32,9 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.HistogramDynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+import org.apache.helix.zookeeper.zkclient.ZkClient;

Review comment:
       Unnecessary?

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -109,12 +135,8 @@ public DynamicMBeanProvider register() throws JMException {
     doRegister(attributeList, MBEAN_DESCRIPTION,
         getObjectName(_monitorType, _monitorKey, _monitorInstanceName));
     for (ZkClientPathMonitor.PredefinedPath path : ZkClientPathMonitor.PredefinedPath.values()) {
-      // If monitor root path only, check if the current path is Root.
-      // Otherwise, add monitors for every path.
-      if (!_monitorRootOnly || path.equals(ZkClientPathMonitor.PredefinedPath.Root)) {
-        _zkClientPathMonitorMap.put(path,
-            new ZkClientPathMonitor(path, _monitorType, _monitorKey, _monitorInstanceName)
-                .register());
+      if (_zkClientPathMonitorMap.get(path) != null)  {
+        _zkClientPathMonitorMap.get(path).register();

Review comment:
       nit, it could be simpler to be:
   
   _zkClientPathMonitorMap.values().stream().foreach(...);




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492432202



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -38,9 +38,13 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 import org.apache.helix.monitoring.mbeans.exception.MetricException;
 import org.apache.helix.zookeeper.zkclient.ZkEventThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 public class ZkClientMonitor extends DynamicMBeanProvider {
+  private static final Logger LOG = LoggerFactory.getLogger(ZkClientMonitor.class);

Review comment:
       fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
pkuwm commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-699452166


   I've triggered rerun the workflow. 
   FYI, @kaisun2000 This is a button for us to **Re-run jobs**.
   <img width="388" alt="image" src="https://user-images.githubusercontent.com/5187721/94335637-c04f8880-ff91-11ea-8f56-2a7d70bfb6be.png">
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r476910637



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       This does not seem to work. 
   
   First  it is not necessary to pass _monitor into `connect(connectTimeout, this, _monitor);` as _monitor is instance variable and connect is method. Connect can see it anyway.
   
   Second, the gist to avoid all these racing issue is that we must make sure _monitor is fully constructed and also register-ed before ZkClient either get callback from ZK.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490667526



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);

Review comment:
       no, not realiably I guess.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490646159



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -96,6 +96,12 @@ public static ObjectName getObjectName(String monitorType, String monitorKey,
             (monitorKey + (monitorInstanceName == null ? "" : "." + monitorInstanceName)));
   }
 
+  public void setAndInitZkEventThreadMonitor(ZkEventThread zkEventThread) {
+    if (_zkEventThreadMetric == null) {
+      _zkEventThreadMetric = new ZkThreadMetric(zkEventThread);

Review comment:
       Let me put a return value as false and let the calling layer log error




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r477028039



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/ZkTestBase.java
##########
@@ -143,7 +143,11 @@ protected ZkServer startZkServer(final String zkAddress) {
     int port = Integer.parseInt(zkAddress.substring(zkAddress.lastIndexOf(':') + 1));
     ZkServer zkServer = new ZkServer(dataDir, logDir, defaultNameSpace, port);
     System.out.println("Starting ZK server at " + zkAddress);
-    zkServer.start();
+    try {
+      zkServer.start();
+    } catch (Exception e) {

Review comment:
       What's the reason to catch exception and print? I guess you want to have the stacktrace for debugging. But please be noted that if exception is thrown, this change will return a zkServer that's not started. I don't think this is what you expected, is it?
   I guest keeping it to throw exception so the test would not proceed. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492425479



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -38,9 +38,13 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 import org.apache.helix.monitoring.mbeans.exception.MetricException;
 import org.apache.helix.zookeeper.zkclient.ZkEventThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 public class ZkClientMonitor extends DynamicMBeanProvider {
+  private static final Logger LOG = LoggerFactory.getLogger(ZkClientMonitor.class);

Review comment:
       _logger exists in the super class. There is no need to add a new one here.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -32,6 +32,9 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.HistogramDynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+import org.apache.helix.zookeeper.zkclient.ZkClient;

Review comment:
       Unnecessary?

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -109,12 +135,8 @@ public DynamicMBeanProvider register() throws JMException {
     doRegister(attributeList, MBEAN_DESCRIPTION,
         getObjectName(_monitorType, _monitorKey, _monitorInstanceName));
     for (ZkClientPathMonitor.PredefinedPath path : ZkClientPathMonitor.PredefinedPath.values()) {
-      // If monitor root path only, check if the current path is Root.
-      // Otherwise, add monitors for every path.
-      if (!_monitorRootOnly || path.equals(ZkClientPathMonitor.PredefinedPath.Root)) {
-        _zkClientPathMonitorMap.put(path,
-            new ZkClientPathMonitor(path, _monitorType, _monitorKey, _monitorInstanceName)
-                .register());
+      if (_zkClientPathMonitorMap.get(path) != null)  {
+        _zkClientPathMonitorMap.get(path).register();

Review comment:
       nit, it could be simpler to be:
   
   _zkClientPathMonitorMap.values().stream().foreach(...);




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r498553162



##########
File path: helix-core/src/test/java/org/apache/helix/integration/TestPartitionMovementThrottle.java
##########
@@ -215,7 +215,7 @@ public void testANYtypeThrottle() throws InterruptedException {
     DelayedTransition.setDelay(20);
     DelayedTransition.enableThrottleRecord();
 
-    // start another 3 nodes
+    // start another 3 nodes 

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490636370



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -96,6 +96,12 @@ public static ObjectName getObjectName(String monitorType, String monitorKey,
             (monitorKey + (monitorInstanceName == null ? "" : "." + monitorInstanceName)));
   }
 
+  public void setAndInitZkEventThreadMonitor(ZkEventThread zkEventThread) {
+    if (_zkEventThreadMetric == null) {
+      _zkEventThreadMetric = new ZkThreadMetric(zkEventThread);

Review comment:
       BTW, you can check for registration status by checking _objectName with a synchronized lock on the monitor object.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -96,6 +96,12 @@ public static ObjectName getObjectName(String monitorType, String monitorKey,
             (monitorKey + (monitorInstanceName == null ? "" : "." + monitorInstanceName)));
   }
 
+  public void setAndInitZkEventThreadMonitor(ZkEventThread zkEventThread) {
+    if (_zkEventThreadMetric == null) {
+      _zkEventThreadMetric = new ZkThreadMetric(zkEventThread);

Review comment:
       nit, throw Exception if the _zkEventThreadMetric has already been set or the monitor object has already been registered. Otherwise, we may see an unexpected monitor value in the final result.

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);

Review comment:
       Is this value still validatable?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-702409261


   @jiajunwang, @pkuwm, as we looked, my github page does not have this UI button to re-run. Other than that, any other concerns? Or let us merge this in.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474872812



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2156,6 +2160,21 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       LOG.debug("ZkClient created with _uid {}, _eventThread {}", _uid, _eventThread.getId());
 
+      // initiate monitor
+      try {
+        if (_monitorKey != null && !_monitorKey.isEmpty() && _monitorType != null && !_monitorType
+            .isEmpty()) {
+          _monitor =
+              new ZkClientMonitor(_monitorType, _monitorKey, _monitorInstanceName, _monitorRootPathOnly,

Review comment:
       Basically, currently code has a race condition. The event of _monitor construction and _monitor first event recording sequence are not deterministic. _monitor construction is in main thread which construct szkclient while statechangecount in _monitor is increased in zookeeper client event thread by calling process() of zkclient. 
   
   See the description of the issue #1294 for more info.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r494039008



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -96,7 +96,7 @@ public ZkClientMonitor(String monitorType, String monitorKey, String monitorInst
 
     for (ZkClientPathMonitor.PredefinedPath path : ZkClientPathMonitor.PredefinedPath.values()) {
       // If monitor root path only, check if the current path is Root.
-      // Otherwise, add monitors for every path.
+      // Otherwise, add monitors for every  path.

Review comment:
       nit, but could you please revert this "touch" change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r493116075



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -34,6 +34,7 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 
 
+

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r477480668



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/ZkTestBase.java
##########
@@ -143,7 +143,11 @@ protected ZkServer startZkServer(final String zkAddress) {
     int port = Integer.parseInt(zkAddress.substring(zkAddress.lastIndexOf(':') + 1));
     ZkServer zkServer = new ZkServer(dataDir, logDir, defaultNameSpace, port);
     System.out.println("Starting ZK server at " + zkAddress);
-    zkServer.start();
+    try {
+      zkServer.start();
+    } catch (Exception e) {

Review comment:
       You are right. Let me re-throw the exception to prevent the test from proceeding.
   
   Without printing it out here, the problem is that Intellij will not print out the exception here. Thus, it seems to be a mystery as why all the test is skipped.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492432202



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -38,9 +38,13 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 import org.apache.helix.monitoring.mbeans.exception.MetricException;
 import org.apache.helix.zookeeper.zkclient.ZkEventThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 public class ZkClientMonitor extends DynamicMBeanProvider {
+  private static final Logger LOG = LoggerFactory.getLogger(ZkClientMonitor.class);

Review comment:
       fixed.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -32,6 +32,9 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.HistogramDynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+import org.apache.helix.zookeeper.zkclient.ZkClient;

Review comment:
       removed.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -109,12 +135,8 @@ public DynamicMBeanProvider register() throws JMException {
     doRegister(attributeList, MBEAN_DESCRIPTION,
         getObjectName(_monitorType, _monitorKey, _monitorInstanceName));
     for (ZkClientPathMonitor.PredefinedPath path : ZkClientPathMonitor.PredefinedPath.values()) {
-      // If monitor root path only, check if the current path is Root.
-      // Otherwise, add monitors for every path.
-      if (!_monitorRootOnly || path.equals(ZkClientPathMonitor.PredefinedPath.Root)) {
-        _zkClientPathMonitorMap.put(path,
-            new ZkClientPathMonitor(path, _monitorType, _monitorKey, _monitorInstanceName)
-                .register());
+      if (_zkClientPathMonitorMap.get(path) != null)  {
+        _zkClientPathMonitorMap.get(path).register();

Review comment:
       There are many ways to write a loop and now we have stream. 
   
   Still, I will just change it anyway.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-698112627


   @kaisun2000 Please run helix-core tests since the change impacts almost all logics.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474351453



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       Seems you have concern about this one by increase it one by one https://github.com/apache/helix/pull/1295/files#r474338328
   
   Let me share my thinking here. 
   
   The doAsyncSync() callback finishing is async to main thread here. It can increase the read counter anytime.  Thus, we need TestHelper.verify to wait for this read from sync(). Here, delete line 301  assert is also ok, just wait for 307 having read counter as 2. Same reason, assert 301 is 1 is fine. 
   
   In sum, either way it is fine. But without TestHelper.verify waiting is not fine. 

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -1281,6 +1279,10 @@ protected void doRetry() throws Exception {
         });
   }
 
+  public boolean getSyncStatus() {

Review comment:
       1/ Will make this package level by removing public.
   
   2/ See my comment https://github.com/apache/helix/pull/1295/files#r474351453. See if this address the concern.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r476840330



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       Let's give it a more complete change.
   1. The constructor will not require _eventThread anymore, it will be null anyway.
   2. setAndInitZkEventThreadMonitor shall create the event thread monitor and call updateAttributesInfo() to update the attribute. Re-register the whole monitor is wrong, you will end up either having 2 mbeans or the 2nd register will fail.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483230680



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       "But without TestHelper.verify waiting is not fine." why is that? You can TestHelper.verify to wait until the count is 2.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483319148



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       Maybe I mis-understand what do you mean. Can you draw the code a little bit more specific? 
   
   from the above pseduo code
   
   ```
   _monitor = new ZkClientMonitor();
   // Comment that we need to update the _monitor in the event lock to avoid missing any records about event thread.
   connect(connectTimeout, this, _monitor);
   _monitor.register()
   ```
   
   Assertion: construction of ZkMonitor and ZkThreadMonitor and their registration must be done before the first doAsyncSync() is sent out, or you have race condition of missing some read/write.
   
   In the above pseudo code, `_monitor.register()` is too late, the first sycn() is already out and complete earlier than this register().  This violate the above assertion. 
   
   I may not fully understand your idea. Let us be more specific. As long as it works, I can change it to that way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang merged pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang merged pull request #1295:
URL: https://github.com/apache/helix/pull/1295


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490667192



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
     zkClient.exists(TEST_ROOT);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+
+    // Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 2);
+    // wait for both doAysncSync() finish and zkClient.exists(Test_ROOT) finish from 2 different threads.
+    // The condition would be ReadCounter is 2.
+    boolean verifyResult = TestHelper.verify(()->{
+       return (long) beanServer.getAttribute(rootname, "ReadCounter") == 2;
+    }, TestHelper.WAIT_DURATION);
+    Assert.assertTrue(verifyResult, " did not see 2 read yet");
+
     Assert.assertTrue((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter") >= 0);
     Assert.assertTrue((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max") >= 0);

Review comment:
       fixed.

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
     zkClient.exists(TEST_ROOT);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+
+    // Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 2);
+    // wait for both doAysncSync() finish and zkClient.exists(Test_ROOT) finish from 2 different threads.
+    // The condition would be ReadCounter is 2.
+    boolean verifyResult = TestHelper.verify(()->{
+       return (long) beanServer.getAttribute(rootname, "ReadCounter") == 2;
+    }, TestHelper.WAIT_DURATION);
+    Assert.assertTrue(verifyResult, " did not see 2 read yet");
+
     Assert.assertTrue((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter") >= 0);

Review comment:
       fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r489835498



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -1281,6 +1279,10 @@ protected void doRetry() throws Exception {
         });
   }
 
+  public boolean getSyncStatus() {

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r489832924



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       I test it. It seems not true. We need to register the bean in connect() under the lock. Otherwise, depends on timing, you may lose one read.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
pkuwm commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-702411523


   > @jiajunwang, @pkuwm, as we looked, my github page does not have this UI button to re-run. Other than that, any other concerns? Or let us merge this in.
   
   @kaisun2000 I've helped re-run all the jobs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483314687



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -1281,6 +1279,10 @@ protected void doRetry() throws Exception {
         });
   }
 
+  public boolean getSyncStatus() {

Review comment:
       See the discussion above about why think we can't get rid of this getSyncStatus(). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474336659



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -124,6 +125,13 @@
   private PathBasedZkSerializer _pathBasedZkSerializer;
   private ZkClientMonitor _monitor;
 
+  final private String _monitorKey;

Review comment:
       These fields are not helpful to the ZkClient.
   Could you try to change the ZkClientMonitor instead so as to avoid these fields? What I am thinking is that, allowing the _eventThread sub-monitor to be added later after the ZkClientMonitor has been constructed. So it can be done in the connect() by referring to _monitor.

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       Do we really need this check? If we ensure after exists(), the read is 2, then we shall be good, right?
   Note that to check this read counter one by one, we need to add one method just for this purpose. I think it is an overkill and it is not necessary.

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/TestHelper.java
##########
@@ -32,7 +32,7 @@
 
 public class TestHelper {
   private static final Logger LOG = LoggerFactory.getLogger(TestHelper.class);
-  public static final long WAIT_DURATION = 20 * 1000L; // 20 seconds
+  public static final long WAIT_DURATION = 60 * 1000L; // 20 seconds

Review comment:
       1. the comment is not updated accordingly.
   2. Could you please justify the reason we need to triple the wait duration?

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       What happen to these 2 checks?

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -1281,6 +1279,10 @@ protected void doRetry() throws Exception {
         });
   }
 
+  public boolean getSyncStatus() {

Review comment:
       I have multiple questions.
   1. If this is just for test, please make it package-private.
   2. Do we really need to check the read count one by one increase?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490635112



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2152,6 +2150,19 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       IZkConnection zkConnection = getConnection();
       _eventThread = new ZkEventThread(zkConnection.getServers());
+
+      if (_monitor != null) {
+        _monitor.setAndInitZkEventThreadMonitor(_eventThread);
+      }
+
+      try {

Review comment:
       I debugged with your code, so the ZkClientPathMonitor objects are currently only created in the register() method. So that's the reason we lose the count if register later.
   
   There is a simple fix, we can instantiate all the ZkClientPathMonitors in the ZkClientMonitor constructor. Like this,
       for (ZkClientPathMonitor.PredefinedPath path : 
   ```
   ZkClientPathMonitor.PredefinedPath.values()) {
         // If monitor root path only, check if the current path is Root.
         // Otherwise, add monitors for every path.
         if (!_monitorRootOnly || path.equals(ZkClientPathMonitor.PredefinedPath.Root)) {
           _zkClientPathMonitorMap.put(path,
               new ZkClientPathMonitor(path, _monitorType, _monitorKey, _monitorInstanceName));
         }
       }
   ```
   Then in the register(), just register all the ZkClientPathMonitor objects that are in the _zkClientPathMonitorMap.
   
   The good thing is that it seems to be the right change that we should do no matter we want to fix the test or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483228122



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/TestHelper.java
##########
@@ -32,7 +32,7 @@
 
 public class TestHelper {
   private static final Logger LOG = LoggerFactory.getLogger(TestHelper.class);
-  public static final long WAIT_DURATION = 20 * 1000L; // 20 seconds
+  public static final long WAIT_DURATION = 60 * 1000L; // 20 seconds

Review comment:
       "the comment is not updated accordingly."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-702441308


   @jiajunwang, removed the extra space to trigger run from my side. Please help to merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-698112627


   @kaisun2000 Please run helix-core tests since the change impacts almost all logics.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492425479



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -38,9 +38,13 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 import org.apache.helix.monitoring.mbeans.exception.MetricException;
 import org.apache.helix.zookeeper.zkclient.ZkEventThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 public class ZkClientMonitor extends DynamicMBeanProvider {
+  private static final Logger LOG = LoggerFactory.getLogger(ZkClientMonitor.class);

Review comment:
       _logger exists in the super class. There is no need to add a new one here.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -32,6 +32,9 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.HistogramDynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+import org.apache.helix.zookeeper.zkclient.ZkClient;

Review comment:
       Unnecessary?

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -109,12 +135,8 @@ public DynamicMBeanProvider register() throws JMException {
     doRegister(attributeList, MBEAN_DESCRIPTION,
         getObjectName(_monitorType, _monitorKey, _monitorInstanceName));
     for (ZkClientPathMonitor.PredefinedPath path : ZkClientPathMonitor.PredefinedPath.values()) {
-      // If monitor root path only, check if the current path is Root.
-      // Otherwise, add monitors for every path.
-      if (!_monitorRootOnly || path.equals(ZkClientPathMonitor.PredefinedPath.Root)) {
-        _zkClientPathMonitorMap.put(path,
-            new ZkClientPathMonitor(path, _monitorType, _monitorKey, _monitorInstanceName)
-                .register());
+      if (_zkClientPathMonitorMap.get(path) != null)  {
+        _zkClientPathMonitorMap.get(path).register();

Review comment:
       nit, it could be simpler to be:
   
   _zkClientPathMonitorMap.values().stream().foreach(...);

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -34,6 +34,7 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 
 
+

Review comment:
       Unnecessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490641273



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -96,6 +96,12 @@ public static ObjectName getObjectName(String monitorType, String monitorKey,
             (monitorKey + (monitorInstanceName == null ? "" : "." + monitorInstanceName)));
   }
 
+  public void setAndInitZkEventThreadMonitor(ZkEventThread zkEventThread) {
+    if (_zkEventThreadMetric == null) {
+      _zkEventThreadMetric = new ZkThreadMetric(zkEventThread);

Review comment:
       What type of exception would be good here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r489834018



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2152,6 +2150,19 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       IZkConnection zkConnection = getConnection();
       _eventThread = new ZkEventThread(zkConnection.getServers());
+
+      if (_monitor != null) {
+        _monitor.setAndInitZkEventThreadMonitor(_eventThread);
+      }
+
+      try {

Review comment:
       Note, It seems crucial to register here. Or depending on timing, we may lose read of the first sync().




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r494039008



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -96,7 +96,7 @@ public ZkClientMonitor(String monitorType, String monitorKey, String monitorInst
 
     for (ZkClientPathMonitor.PredefinedPath path : ZkClientPathMonitor.PredefinedPath.values()) {
       // If monitor root path only, check if the current path is Root.
-      // Otherwise, add monitors for every path.
+      // Otherwise, add monitors for every  path.

Review comment:
       nit, but could you please revert this "touch" change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r498544124



##########
File path: helix-core/src/test/java/org/apache/helix/integration/TestPartitionMovementThrottle.java
##########
@@ -215,7 +215,7 @@ public void testANYtypeThrottle() throws InterruptedException {
     DelayedTransition.setDelay(20);
     DelayedTransition.enableThrottleRecord();
 
-    // start another 3 nodes
+    // start another 3 nodes 

Review comment:
       Please remove the extra space before merge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-699106532


   @kaisun2000 you can rerun the test job in Github UI. There is no need to touch code for a re-run.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474349621



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2156,6 +2160,21 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       LOG.debug("ZkClient created with _uid {}, _eventThread {}", _uid, _eventThread.getId());
 
+      // initiate monitor
+      try {
+        if (_monitorKey != null && !_monitorKey.isEmpty() && _monitorType != null && !_monitorType
+            .isEmpty()) {
+          _monitor =
+              new ZkClientMonitor(_monitorType, _monitorKey, _monitorInstanceName, _monitorRootPathOnly,

Review comment:
       My thinking is that ZkClientMonitor does not hold any session object, nor thread. Let GC deal with it would be fine. Or do you see otherwise?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r489835747



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       changed to waiting for 2




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483227594



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       1. This is to make the code cleaner. It is a good practice for module the code that the private method does not refer to the class private fields directly.
   
   2. The thread monitor is the only one impacted, so if you do the set there, there should be no problem. Mix everything together is chaotic. Ideally, I would suggest splitting the ZK client monitor and ZK thread monitor. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-697061887


   This diff is approved. Please help to merge
   
   >ZkClient connects to ZooKeeper client first before monitor bean
   initialization and registration. This cause race condition that fails
   metrics test.Fix this issue by properly construct the object.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492474278



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -34,6 +34,7 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
 
 
+

Review comment:
       Unnecessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492432360



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientPathMonitor.java
##########
@@ -32,6 +32,9 @@
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.DynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.HistogramDynamicMetric;
 import org.apache.helix.monitoring.mbeans.dynamicMBeans.SimpleDynamicMetric;
+import org.apache.helix.zookeeper.zkclient.ZkClient;

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r488401062



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       Let's sync up tomorrow. I didn't get why waiting for the read count = 2 won't work.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r492436372



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/metric/ZkClientMonitor.java
##########
@@ -109,12 +135,8 @@ public DynamicMBeanProvider register() throws JMException {
     doRegister(attributeList, MBEAN_DESCRIPTION,
         getObjectName(_monitorType, _monitorKey, _monitorInstanceName));
     for (ZkClientPathMonitor.PredefinedPath path : ZkClientPathMonitor.PredefinedPath.values()) {
-      // If monitor root path only, check if the current path is Root.
-      // Otherwise, add monitors for every path.
-      if (!_monitorRootOnly || path.equals(ZkClientPathMonitor.PredefinedPath.Root)) {
-        _zkClientPathMonitorMap.put(path,
-            new ZkClientPathMonitor(path, _monitorType, _monitorKey, _monitorInstanceName)
-                .register());
+      if (_zkClientPathMonitorMap.get(path) != null)  {
+        _zkClientPathMonitorMap.get(path).register();

Review comment:
       There are many ways to write a loop and now we have stream. 
   
   Still, I will just change it anyway.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483319148



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       Maybe I mis-understand what do you mean. Can you draw the code a little bit more specific? 
   
   from the above pseduo code
   
   ```
   _monitor = new ZkClientMonitor();
   // Comment that we need to update the _monitor in the event lock to avoid missing any records about event thread.
   connect(connectTimeout, this, _monitor);
   _monitor.register()
   ```
   
   construction of ZkMonitor and ZkThreadMonitor and their registration must be done before the first doAsyncSync() is sent out, or you have race condition of missing some read/write.
   
   In the above pseudo code, `_monitor.register()` is too late, the first sycn() is already out and complete earlier than this register().  This violate the above assertion. 
   
   I may not fully understand your idea. Let us be more specific. As long as it works, I can change it to that way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r481458782



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,24 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+
+    zkClient.exists(TEST_ROOT);

Review comment:
       This is not to assert exists() return true. This is to make sure the next `Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 2);` would be true. Namely this trigger a read and get recorded by bean.
   
   This is not functionality added by me. It is basically line 298 in previous version.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r480386466



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/ZkTestBase.java
##########
@@ -143,7 +143,11 @@ protected ZkServer startZkServer(final String zkAddress) {
     int port = Integer.parseInt(zkAddress.substring(zkAddress.lastIndexOf(':') + 1));
     ZkServer zkServer = new ZkServer(dataDir, logDir, defaultNameSpace, port);
     System.out.println("Starting ZK server at " + zkAddress);
-    zkServer.start();
+    try {
+      zkServer.start();
+    } catch (Exception e) {

Review comment:
       Revert this part.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r481454076



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,24 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+
+    zkClient.exists(TEST_ROOT);

Review comment:
       Nit: If we need to check exits(), we should assert true? I actually don't think it is necessary as below already checks. But if really check, assert true is needed.
   ```
   if (!_zkClient.exists(TEST_ROOT)) {
          _zkClient.createPersistent(TEST_ROOT, true);
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490637634



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -31,9 +31,11 @@
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.CopyOnWriteArraySet;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicLong;
 import javax.management.JMException;
 
+import com.google.common.annotations.VisibleForTesting;

Review comment:
       With the latest proposal, this one can be removed too.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -31,9 +31,11 @@
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.CopyOnWriteArraySet;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;

Review comment:
       no need anymore?

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
     zkClient.exists(TEST_ROOT);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+
+    // Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 2);
+    // wait for both doAysncSync() finish and zkClient.exists(Test_ROOT) finish from 2 different threads.
+    // The condition would be ReadCounter is 2.
+    boolean verifyResult = TestHelper.verify(()->{
+       return (long) beanServer.getAttribute(rootname, "ReadCounter") == 2;
+    }, TestHelper.WAIT_DURATION);
+    Assert.assertTrue(verifyResult, " did not see 2 read yet");
+
     Assert.assertTrue((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter") >= 0);
     Assert.assertTrue((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max") >= 0);

Review comment:
       If we checked "ReadLatencyGauge.Max" before, then this one can be ignored.

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);

Review comment:
       Actually, can we wait here before "zkClient.exists(TEST_ROOT);" for ReadCounter == 1. Then wait here for 2? So we ensure exist call will increase one counter too.

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);

Review comment:
       If you wait for the counter in the previous line, then here we can still check for
   
       Assert.assertTrue((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter") >= 0);
       Assert.assertTrue((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max") >= 0);	    
   
   right?

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
     zkClient.exists(TEST_ROOT);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+
+    // Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 2);
+    // wait for both doAysncSync() finish and zkClient.exists(Test_ROOT) finish from 2 different threads.
+    // The condition would be ReadCounter is 2.
+    boolean verifyResult = TestHelper.verify(()->{
+       return (long) beanServer.getAttribute(rootname, "ReadCounter") == 2;
+    }, TestHelper.WAIT_DURATION);
+    Assert.assertTrue(verifyResult, " did not see 2 read yet");
+
     Assert.assertTrue((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter") >= 0);

Review comment:
       I propose that we record the ReadTotalLatencyCounter that we get before exists() call. Then we check if this counter is equal to the previous value or is increased here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 edited a comment on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 edited a comment on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-697061887


   This diff is approved. Please help to merge
   
   >ZkClient connects to ZooKeeper before monitor bean
   initialization and registration. This causes race condition that fails
   metrics test. Fix this issue by properly construct the object.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474348566



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       You are right. line 302 and 303 can/should be removed.  The time value is indeterministic. 
   
   line 301 is fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r476833981



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       2 ways are different because you will need to add code for supporting the check. If you think both ways of checking are fine, let's do the simpler one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r486708884



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       The registration does not need to happen prior. Even not registered, the metric can record data.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474351691



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -1281,6 +1279,10 @@ protected void doRetry() throws Exception {
         });
   }
 
+  public boolean getSyncStatus() {

Review comment:
       1/ make this one @visibleForTesting. 
   
   2/ See my comment https://github.com/apache/helix/pull/1295/files#r474351453. See if this address the concern.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483317080



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/TestHelper.java
##########
@@ -32,7 +32,7 @@
 
 public class TestHelper {
   private static final Logger LOG = LoggerFactory.getLogger(TestHelper.class);
-  public static final long WAIT_DURATION = 20 * 1000L; // 20 seconds
+  public static final long WAIT_DURATION = 60 * 1000L; // 20 seconds

Review comment:
       Good point, will update this one. Let us first resolve the bigger issue about how to initialize and how to make test stable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474872812



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2156,6 +2160,21 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       LOG.debug("ZkClient created with _uid {}, _eventThread {}", _uid, _eventThread.getId());
 
+      // initiate monitor
+      try {
+        if (_monitorKey != null && !_monitorKey.isEmpty() && _monitorType != null && !_monitorType
+            .isEmpty()) {
+          _monitor =
+              new ZkClientMonitor(_monitorType, _monitorKey, _monitorInstanceName, _monitorRootPathOnly,

Review comment:
       Basically, currently code has a race condition. The event of _monitor construction and _monitor first event recording sequence are not deterministic. _monitor construction are in main thread which construct zkclient while statechangecount in _monitor is increased in zookeeper client thread by calling process() of zkclient. 
   
   See the description of the issue #1294 for more info.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483319148



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       Maybe I mis-understand what do you mean. Can you draw the code a little bit more specific? 
   
   from the above pseduo code
   
   ```
   _monitor = new ZkClientMonitor();
   // Comment that we need to update the _monitor in the event lock to avoid missing any records about event thread.
   connect(connectTimeout, this, _monitor);
   _monitor.register()
   ```
   
   Assertion: construction of ZkMonitor and ZkThreadMonitor and their registration must be done before the first doAsyncSync() is sent out, or you have race condition of missing some read/write.
   
   In the above pseudo code, `_monitor.register()` is too late, the first sycn() is already out and it may complete earlier than this register().  This violate the above assertion. You may miss one increase of read counter.
   
   I may not fully understand your idea. Let us be more specific. As long as it works, I can change it to that way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] lei-xia commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
lei-xia commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474357113



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2156,6 +2160,21 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       LOG.debug("ZkClient created with _uid {}, _eventThread {}", _uid, _eventThread.getId());
 
+      // initiate monitor
+      try {
+        if (_monitorKey != null && !_monitorKey.isEmpty() && _monitorType != null && !_monitorType
+            .isEmpty()) {
+          _monitor =
+              new ZkClientMonitor(_monitorType, _monitorKey, _monitorInstanceName, _monitorRootPathOnly,

Review comment:
       Why we move this into connect()? what problem we try to solve here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474338627



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2156,6 +2160,21 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       LOG.debug("ZkClient created with _uid {}, _eventThread {}", _uid, _eventThread.getId());
 
+      // initiate monitor
+      try {
+        if (_monitorKey != null && !_monitorKey.isEmpty() && _monitorType != null && !_monitorType
+            .isEmpty()) {
+          _monitor =
+              new ZkClientMonitor(_monitorType, _monitorKey, _monitorInstanceName, _monitorRootPathOnly,

Review comment:
       We should consider closing the monitor to avoid leakage. If the later part of code zk connection timeout and throws exception, the monitor should also be closed.

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/TestHelper.java
##########
@@ -32,7 +32,7 @@
 
 public class TestHelper {
   private static final Logger LOG = LoggerFactory.getLogger(TestHelper.class);
-  public static final long WAIT_DURATION = 20 * 1000L; // 20 seconds
+  public static final long WAIT_DURATION = 60 * 1000L; // 20 seconds

Review comment:
       Does the test really need 60s to poll the result? In my opinion it is too long. It would fail with 20s?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474347499



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/TestHelper.java
##########
@@ -32,7 +32,7 @@
 
 public class TestHelper {
   private static final Logger LOG = LoggerFactory.getLogger(TestHelper.class);
-  public static final long WAIT_DURATION = 20 * 1000L; // 20 seconds
+  public static final long WAIT_DURATION = 60 * 1000L; // 20 seconds

Review comment:
       Github zookeeper is slow. See #1268 for profiling result. Using 60 secs does not really hurt anything, as we still poll the condition every 500ms. 
   
   Otherwise, the test can be flaky.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490635112



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2152,6 +2150,19 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       IZkConnection zkConnection = getConnection();
       _eventThread = new ZkEventThread(zkConnection.getServers());
+
+      if (_monitor != null) {
+        _monitor.setAndInitZkEventThreadMonitor(_eventThread);
+      }
+
+      try {

Review comment:
       I debugged with your code, so the ZkClientPathMonitor objects are currently only created in the register() method. So that's the reason we lose the count if register later.
   
   There is a simple fix, we can instantiate all the ZkClientPathMonitors in the ZkClientMonitor constructor. Like this,
       for (ZkClientPathMonitor.PredefinedPath path : 
   ```
   ZkClientPathMonitor.PredefinedPath.values()) {
         // If monitor root path only, check if the current path is Root.
         // Otherwise, add monitors for every path.
         if (!_monitorRootOnly || path.equals(ZkClientPathMonitor.PredefinedPath.Root)) {
           _zkClientPathMonitorMap.put(path,
               new ZkClientPathMonitor(path, _monitorType, _monitorKey, _monitorInstanceName));
         }
       }
   ```
   Then in the register(), just register all the ZkClientPathMonitor objects that are in the _zkClientPathMonitorMap.
   
   The good thing is that it seems to be the right thing to do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490667383



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,20 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
-    // account for doAsyncSync()
-    Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);

Review comment:
       right.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-697061887


   This diff is approved. Please help to merge
   
   >ZkClient connects to ZooKeeper client first before monitor bean
   initialization and registration. This cause race condition that fails
   metrics test.Fix this issue by properly construct the object.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r476862904



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -222,22 +226,16 @@ protected ZkClient(IZkConnection zkConnection, int connectionTimeout, long opera
     _asyncCallRetryThread.start();
     LOG.debug("ZkClient created with _uid {}, _asyncCallRetryThread id {}", _uid, _asyncCallRetryThread.getId());
 
-    connect(connectionTimeout, this);
-
-    // initiate monitor
-    try {
-      if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
-          .isEmpty()) {
-        _monitor =
-            new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
-                _eventThread);
-        _monitor.register();
-      } else {
-        LOG.info("ZkClient monitor key or type is not provided. Skip monitoring.");
-      }
-    } catch (JMException e) {
-      LOG.error("Error in creating ZkClientMonitor", e);
+    if (monitorKey != null && !monitorKey.isEmpty() && monitorType != null && !monitorType
+        .isEmpty()) {
+      _monitor =
+          new ZkClientMonitor(monitorType, monitorKey, monitorInstanceName, monitorRootPathOnly,
+              _eventThread);

Review comment:
       Synced with Kai offline, the 2nd point is fine since the register will be postpone to the connect. Given this, I would prefer making the registration logic closer to the monitor construct logic.
   
   So shall we pass the ZkClientMonitor into connect() to explicitly update it in the call. Then register the monitor after the connect(), but still in the constructor.
   
   pseudo code:
   
   _monitor = new ZkClientMonitor();
   // Comment that we need to update the _monitor in the event lock to avoid missing any records about event thread.
   connect(connectTimeout, this, _monitor);
   _monitor.register()




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490667671



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -2152,6 +2150,19 @@ public void connect(final long maxMsToWaitUntilConnected, Watcher watcher)
 
       IZkConnection zkConnection = getConnection();
       _eventThread = new ZkEventThread(zkConnection.getServers());
+
+      if (_monitor != null) {
+        _monitor.setAndInitZkEventThreadMonitor(_eventThread);
+      }
+
+      try {

Review comment:
       implemeted this way




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r474870997



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -124,6 +125,13 @@
   private PathBasedZkSerializer _pathBasedZkSerializer;
   private ZkClientMonitor _monitor;
 
+  final private String _monitorKey;

Review comment:
       Changed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483316097



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       Unfortunately, you can't or you have race condition in this case, if my understanding is right. Let me illustrate:
   
   doAsycSync(), when the success code returns, it would increase by read counter one from the asyncThread in the ZkClient. 
   
   Note, the main thread running `zkClient.exist(Test_Root )` would increase read counter synchronously from main thread.
   
   The problem is that the doAsynSync() increase to read counter is totally async to the main thread. It is the case that after main thread finish `zkClient.exist(Test_Root )`, the doAsyncSync() would finish for sure. Recent github test actually illustrate this point. Previous we don't realize as it did not happen.
   
   In fact, theoretically, the doAsyncSync() increase can happen a lot later too. 
   
   Let me know if there is other ways you think that we can make the test stable while also no need to use getSyncStatus().
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483230680



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -285,18 +285,26 @@ public void testZkClientMonitor()
     Assert.assertTrue(beanServer.isRegistered(idealStatename));
 
     Assert.assertEquals((long) beanServer.getAttribute(name, "DataChangeEventCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 0);
+    Assert.assertEquals((long) beanServer.getAttribute(name, "StateChangeEventCounter"), 1);
     Assert.assertEquals((long) beanServer.getAttribute(name, "ExpiredSessionCounter"), 0);
     Assert.assertEquals((long) beanServer.getAttribute(name, "OutstandingRequestGauge"), 0);
     // account for doAsyncSync()
     Assert.assertEquals((long) beanServer.getAttribute(name, "TotalCallbackCounter"), 1);
 
-    // Test exists
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 0);
-    Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadLatencyGauge.Max"), 0);
-    zkClient.exists(TEST_ROOT);
+    // Note, we need to wait here for the reason that doAsyncSync() blocks only the zkClient event thread. The main
+    // thread of zkClient would issue exits(TEST_ROOT) without blocking. The return of doAsyncSync() would be asyc
+    // to main thread. doAsyncSync() is a source of 1 read and main thread exists(TEST_ROOT) would be another.
+    TestHelper.verify(()->{
+      return ((org.apache.helix.zookeeper.zkclient.ZkClient)zkClient).getSyncStatus();
+    }, TestHelper.WAIT_DURATION);
+
     Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadCounter"), 1);
+    //Assert.assertEquals((long) beanServer.getAttribute(rootname, "ReadTotalLatencyCounter"), 1);

Review comment:
       "But without TestHelper.verify waiting is not fine." why is that? You can TestHelper.verify to wait until the count is 2.
   My suggestion is still getting rid of the getSyncStatus()




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r483229578



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -1281,6 +1279,10 @@ protected void doRetry() throws Exception {
         });
   }
 
+  public boolean getSyncStatus() {

Review comment:
       Sorry that I cannot make the link work. Maybe because code has been changed. Could you please copy-paste your comment here? I still think this is not necessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1295:
URL: https://github.com/apache/helix/pull/1295#discussion_r490643968



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -31,9 +31,11 @@
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.CopyOnWriteArraySet;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;

Review comment:
       fix.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 edited a comment on pull request #1295: fix TestRawZkClient unstableness

Posted by GitBox <gi...@apache.org>.
kaisun2000 edited a comment on pull request #1295:
URL: https://github.com/apache/helix/pull/1295#issuecomment-697061887


   This diff is approved. Please help to merge
   
   >ZkClient connects to ZooKeeper before monitor bean
   initialization and registration. This causes race condition that fails
   metrics test. Fix this issue by properly construct the object.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org