You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/07/31 00:00:54 UTC

[GitHub] [helix] pkuwm opened a new pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

pkuwm opened a new pull request #1194:
URL: https://github.com/apache/helix/pull/1194


   ### Issues
   
   - [ ] My PR addresses the following Helix issues and references them in the PR description:
   
   Fixes #1193 
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI changes:
   
   The test testGetChildrenOnLargeNumChildren becomes flaky after more commits are checked in.
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   (List the names of added unit/integration tests)
   
   - [ ] The following is the result of the "mvn test" command on the appropriate module:
   
   (Copy & paste the result of "mvn test")
   
   ### Commits
   
   - [ ] My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation (Optional)
   
   - [ ] In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Code Quality
   
   - [ ] My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463355809



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       So in this case, it would always throw exception, right? 
   
   Then why bother. Just leave the znode there anyway. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463355335



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       You laptop runs on SSD. Desktop or TMC machines can be slower. Leave some room for error is safe here IMO.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463364616



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       My question is that is this exception always thrown? I believe it would always be thrown. Can you verify?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463349131



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);

Review comment:
       How about let us construct a new zkclient here. Is it possible _zkClient would be session timeout by another test or thread?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463348517



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       My concern is this:
   
   if using deleteRecursively(), it won't work because list all the children znode name would cause "marshalling" error as what happened in GCN, right?
   
   if using delete as you do here, I think it would throw another exception as not empty. So this does not work here. Or am I missing something?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463351044



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       This _zkClient is created within this class. While this test is running, there is no other thread referencing this _zkClient to close it.
   
   I considered using another zkClient. But it just adds one more connection to so for this case there will be two connections at the same time. I believe reuse this one is good enough.
   
   I ran it in my laptop as well, 30s is good enough. If it's more than 30s, something is wrong and it should timeout to fail.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang merged pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
jiajunwang merged pull request #1194:
URL: https://github.com/apache/helix/pull/1194


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463414008



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       Yes, then remove this timeout. The test has a default timeout 300sec. In my opinion, relying on 300sec default is much safer. Testing under your own laptop may not fit a lot other environment.
   
   See this:
   
   [ERROR] testLostZkConnection(org.apache.helix.integration.TestZkConnectionLost) Time elapsed: 300.015 s <<< FAILURE!
   org.testng.internal.thread.ThreadTimeoutException: Method org.testng.internal.TestNGMethod.testLostZkConnection() didn't finish within the time-out 300000
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463364780



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       Can you the whole test under helix in desktop to make sure this 30s is safe?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463345100



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       No, we cannot because there would be node not empty exception.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1194:
URL: https://github.com/apache/helix/pull/1194#issuecomment-668196454


   Other than timeout which may not be conservative enough, it looks good. The idea of using transaction and ephemeral node is brilliant. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464608286



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < 110; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
-        return !_zkClient.exists(path);
+        try {
+          return _zkClient.delete(path);

Review comment:
       They are ephemeral nodes and closing the zkClient deletes the ephemeral nodes for cleanup.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463369643



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       No, run all test from helix. That is what TMC is doing I guess. This is to simulate that test environment. We don't want that your fix working here but later it turned out this test still not working TMC. Note the cost over there is way higher than changing it here. 
   
   In fact, I noticed in our test, there are failed test due to Timed out from time to time. Those failure are likely before the machine may not be able to schedule the thread to run while the timer is ticking. I don't think it is a good idea to add a time out here. 
   
   By the way, can you re-open this comment thread.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464771973



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < 110; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
-        return !_zkClient.exists(path);
+        try {
+          return _zkClient.delete(path);

Review comment:
       I think this is fine for testing purposes, but here are some scenarios we can think about that gets us from, say, 95% to 99% stability:
   
   1. doesn't the packet len error cause the session to be terminated? Or does your fix change that behavior? In this case if my assumption is true, the client that created the ephemeral nodes will lose its session and the ephemerals tied to the session will be gone anyway, so the nodes will be gone before zkclient.close()?
   
   2. this is a rare scenario, but sometimes we observe in-memory local clients lose connection for some unknown reason. What impact would that have on the test?
   
   If we really wanted to be right and thorough, I would write persistent nodes, and check that the numChildren field comes out to 110K from Stats with an Assert statement, and then do the explicit deletion at the end. This would guard against any potential client-side glitches and failures.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463347638



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       also for safety, make it 60secs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463368082



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       I did zookeeper-api in both desktop and laptop. Is this what you meant?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on pull request #1194:
URL: https://github.com/apache/helix/pull/1194#issuecomment-668228746


   This PR is ready to be merged, approved by @kaisun2000 
   ```
   testGetChildrenOnLargeNumChildren becomes flaky after more commits are checked in because of reflection doesn't work as expected. This commit fixes it by replacing reflection with creating 110K children for the test.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463387892



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       I actually didn't resolve this thread... I intended to keep it open for discussion.
   
   Running all helix tests is not necessary. This is a separate module. Even running in the test machine, tests in different modules are independent.
   
   Timeout is good practice. Just an example. https://github.com/apache/zookeeper/blob/fe940cdd8fb23ba09684cefb73233d570f4a20fa/zookeeper-server/src/test/java/org/apache/zookeeper/server/ZooKeeperServerMainTest.java#L165
   
   How could you ensure 60s would not timeout then? Then how about 10 minutes, 1 hour?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463342772



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -119,19 +115,6 @@
   // ZkEventThread. Otherwise the retry request might block the normal event processing.
   protected final ZkAsyncRetryThread _asyncCallRetryThread;
 
-  static {

Review comment:
       what is the difference of using this static block vs initalization at declaration?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463370434



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       Can we also re-open this comment thread?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463413069



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       `String path = "/" + TestHelper.getTestMethodName()` is path.
   
   Now at the time this line of code `_zkClient.delete(path);` , there are a lot of znode underlying this `path`. So the delete should always fail, right?  Or am I understanding this wrong?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463934991



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       @kaisun2000 I've replied you: they are ephemeral znodes.
   > It won't always throw exception. If you carefully understand the code, you'll find the tips of using ephemeral nodes and closing the zkClient to delete the ephemeral nodes for cleanup. The exception handling here is just in case zk server doesn't remove ephemeral nodes yet.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464617669



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       It is in test config file: `testng.xml`
   
   Configuring the timeout for test suit for zookeeper-api is out of scope of this PR. If we want to do that, I would prefer to do it in another PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463358352



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       As a good practice, a unit test should cleanup itself and avoid other tests.
   It won't always close. If you carefully understand the code, you'll find the tips of using ephemeral nodes and closing the zkClient to delete the ephemeral nodes for cleanup. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463390049



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       What do you mean by `always thrown`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463347551



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       Another thing is that in these 4 secs, is it possible another test would time out the session of _zkClient. Shall we use a dedicated ZkClient by creating one and later close it. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463342175



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       can we delete parent path here without child node deleted?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463934872



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       Please let me remind you that the default timeout 300sec is set only for helix-core's tests, not in zookeeper-api.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463359412



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       30s has already good room for this test. In my tests, it completes in ~5s. If it really exceeds 30s, maybe 60s don't even help. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463369643



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       No, run all test from helix. That is what TMC is doing I guess. This is to simulate that test environment. We don't want that you fix this one but later it turned out this test still not working over there. Note the cost over there is way higher than changing it here. 
   
   In fact, I noticed in our test, there are failed test due to Timed out from time to time. I don't think it is a good idea to add a time out here. 
   
   By the way, can you re-open this comment thread.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463345847



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       `30 * 1000L ms`.
   
   A good practice would be: if the test relies on an external system(zk here), it'd be good to have a timeout. Otherwise, if the connection hangs, the test also hangs. Eg., if the operation `multi()` hangs since it is sync, it hangs there forever?
   
   Timeout makes the test process continue for other tests.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464215069



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < 110; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");

Review comment:
       Nit: let's say this a little more clearly:
   
   Something on the lines of "Should fail because listing 110K znodes with UUID names should cause packet length exceeded error, causing a connection loss."




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463353667



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       delete() returns boolean. Should handle `ZkException` here as I've removed deleteRecursively().




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464771973



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < 110; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
-        return !_zkClient.exists(path);
+        try {
+          return _zkClient.delete(path);

Review comment:
       I think this is fine for testing purposes, but here are some scenarios we can think about:
   
   1. doesn't the packet len error cause the session to be terminated? Or does your fix change that behavior? In this case if my assumption is true, the client that created the ephemeral nodes will lose its session and the ephemerals tied to the session will be gone anyway, so the nodes will be gone before zkclient.close()?
   
   2. this is a rare scenario, but sometimes we observe in-memory local clients lose connection for some unknown reason. What impact would that have on the test?
   
   If we really wanted to be right and thorough, I would write persistent nodes, and check that the numChildren field comes out to 110K from Stats with an Assert statement, and then do the explicit deletion at the end. This would guard against any potential client-side glitches and failures.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463413069



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       `String path = "/" + TestHelper.getTestMethodName()` is path value.
   
   Now at the time this line of code `_zkClient.delete(path);` , there are a lot of znodes underlying this `path`. So the delete should always fail, with exception such as NoEmpty right?  Or am I understanding this wrong?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464609533



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       where is the place to config the 300secs. Shall we just configure the same in zooscalability-api?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463347039



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -119,19 +115,6 @@
   // ZkEventThread. Otherwise the retry request might block the normal event processing.
   protected final ZkAsyncRetryThread _asyncCallRetryThread;
 
-  static {

Review comment:
       I see.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463346000



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -119,19 +115,6 @@
   // ZkEventThread. Otherwise the retry request might block the normal event processing.
   protected final ZkAsyncRetryThread _asyncCallRetryThread;
 
-  static {

Review comment:
       If you read the comment you'll know :)
   // Set it here for unit test to use reflection to change value	
   // because compilers optimize constants by replacing them inline.
   
   Since we don't need reflection, we change it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463351288



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);

Review comment:
       Replied above. Won't be closed by another thread. It is created in this class, not from a base class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463342435



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       why do we need this? is it 30sec?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463349245



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)

Review comment:
       60 secs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463346000



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
##########
@@ -119,19 +115,6 @@
   // ZkEventThread. Otherwise the retry request might block the normal event processing.
   protected final ZkAsyncRetryThread _asyncCallRetryThread;
 
-  static {

Review comment:
       If you read the comment: // Set it here for unit test to use reflection to change value	
       // because compilers optimize constants by replacing them inline.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464771973



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < 110; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
-        return !_zkClient.exists(path);
+        try {
+          return _zkClient.delete(path);

Review comment:
       I think this is fine for testing purposes, but here are some scenarios we can think about that gets us from, say, 95% to 99% stability:
   
   1. doesn't the packet len error cause the session to be terminated? Or does your fix change that behavior? In this case if my assumption is true, the client that created the ephemeral nodes will lose its session and the ephemerals tied to the session will be gone anyway, so the nodes will be gone before zkclient.close()?
   
   2. this is a rare scenario and probably being extra nitpicky, but sometimes we observe in-memory local clients lose connection for some unknown reason. What impact would that have on the test since that would cause ephemerals to be gone prematurely?
   
   If we really wanted to be right and thorough, I would write persistent nodes, and check that the numChildren field comes out to 110K from Stats with an Assert statement, and then do the explicit deletion at the end. This would guard against any potential client-side glitches and failures.

##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < 110; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
-        return !_zkClient.exists(path);
+        try {
+          return _zkClient.delete(path);

Review comment:
       I think this is fine for testing purposes, but here are some scenarios we can think about that gets us from, say, 95% to 99% stability:
   
   1. doesn't the packet len error cause the session to be terminated? Or does your fix change that behavior? In this case if my assumption is true, the client that created the ephemeral nodes will lose its session and the ephemerals tied to the session will be gone anyway, so the nodes will be gone before zkclient.close()?
   
   2. this is a rare scenario and probably being extra nitpicky, but sometimes we observe in-memory local clients lose connection for some unknown reason. What impact would that have on the test since that would cause ephemerals to be gone prematurely?
   
   If we really wanted to be right and thorough, I would write persistent nodes, and check that the numChildren field comes out to 110K from Stats with an Assert statement, and then do the explicit deletion at the end. This would guard against any potential client-side and server-side glitches and failures.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463358352



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       As a good practice, a unit test should cleanup itself and avoid other tests.
   It won't always throw exception. If you carefully understand the code, you'll find the tips of using ephemeral nodes and closing the zkClient to delete the ephemeral nodes for cleanup. The exception handling here is just in case zk server doesn't remove ephemeral nodes yet.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r463369643



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)

Review comment:
       No, run all test from helix. That is what TMC is doing I guess. This is to simulate that test environment. We don't want that your fix working here but later it turned out this test still not working TMC. Note the cost over there is way higher than changing it here. 
   
   In fact, I noticed in our test, there are failed test due to Timed out from time to time. I don't think it is a good idea to add a time out here. 
   
   By the way, can you re-open this comment thread.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464214252



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < 110; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();

Review comment:
       Nit: childPath




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464604562



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,59 +864,38 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+    int toCreateChildrenCount = 110;
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < toCreateChildrenCount; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
+        _zkClient.delete(path);

Review comment:
       I see. Did not notice this ephemeral node. So closing the _zkClient delete the nodes. The second delete is just to delete the parent path. This looks good. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #1194: Fix flaky test testGetChildrenOnLargeNumChildren

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #1194:
URL: https://github.com/apache/helix/pull/1194#discussion_r464216050



##########
File path: zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
##########
@@ -868,60 +864,41 @@ public void testAsyncWriteOperations() {
    * Tests getChildren() when there are an excessive number of children and connection loss happens,
    * the operation should terminate and exit retry loop.
    */
-  @Test
+  @Test(timeOut = 30 * 1000L)
   public void testGetChildrenOnLargeNumChildren() throws Exception {
-    // Default packetLen is 4M. It is static final and initialized
-    // when first zkClient is created.
-    // So we could not just set "jute.maxbuffer" to change the value.
-    // Reflection is needed to change the value.
-    // Remove "final" modifier
-    Field modifiersField = Field.class.getDeclaredField("modifiers");
-    boolean isModifierAccessible = modifiersField.isAccessible();
-    modifiersField.setAccessible(true);
-
-    Field packetLenField = ClientCnxn.class.getDeclaredField("packetLen");
-    Field childrenLimitField =
-        org.apache.helix.zookeeper.zkclient.ZkClient.class.getDeclaredField("NUM_CHILDREN_LIMIT");
-    modifiersField.setInt(packetLenField, packetLenField.getModifiers() & ~Modifier.FINAL);
-    modifiersField.setInt(childrenLimitField, childrenLimitField.getModifiers() & ~Modifier.FINAL);
-
-    boolean isPacketLenAccessible = packetLenField.isAccessible();
-    packetLenField.setAccessible(true);
-    int originPacketLen = packetLenField.getInt(null);
-    // Keep 150 bytes for successfully creating each child node.
-    packetLenField.set(null, 150);
-
-    boolean isChildrenLimitAccessible = childrenLimitField.isAccessible();
-    childrenLimitField.setAccessible(true);
-    int originChildrenLimit = childrenLimitField.getInt(null);
-    childrenLimitField.set(null, 2);
-
-    String path = "/" + TestHelper.getTestMethodName();
-    // Create 5 children to make packet length of children exceed 150 bytes
+    // Create 110K children to make packet length of children exceed 4 MB
     // and cause connection loss for getChildren() operation
-    for (int i = 0; i < 5; i++) {
-      _zkClient.createPersistent(path + "/" + UUID.randomUUID().toString(), true);
+    String path = "/" + TestHelper.getTestMethodName();
+
+    _zkClient.createPersistent(path);
+
+    for (int i = 0; i < 110; i++) {
+      List<Op> ops = new ArrayList<>(1000);
+      for (int j = 0; j < 1000; j++) {
+        String child = path + "/" + UUID.randomUUID().toString();
+        ops.add(Op.create(child, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL));
+      }
+      // Reduce total creation time by batch creating znodes
+      _zkClient.multi(ops);
     }
 
     try {
       _zkClient.getChildren(path);
-      Assert.fail("Should not successfully get children.");
+      Assert.fail("Should not successfully get children because of connection loss.");
     } catch (ZkException expected) {
       Assert.assertEquals(expected.getMessage(),
           "org.apache.zookeeper.KeeperException$MarshallingErrorException: "
               + "KeeperErrorCode = MarshallingError");
     } finally {
-      packetLenField.set(null, originPacketLen);
-      packetLenField.setAccessible(isPacketLenAccessible);
-
-      childrenLimitField.set(null, originChildrenLimit);
-      childrenLimitField.setAccessible(isChildrenLimitAccessible);
-
-      modifiersField.setAccessible(isModifierAccessible);
+      _zkClient.close();
+      _zkClient = new ZkClient(ZkTestBase.ZK_ADDR);
 
       Assert.assertTrue(TestHelper.verify(() -> {
-        _zkClient.deleteRecursively(path);
-        return !_zkClient.exists(path);
+        try {
+          return _zkClient.delete(path);

Review comment:
       Shouldn't you delete recursively? Wouldn't this fail because of children nodes?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org