You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/01/15 11:26:56 UTC

[GitHub] [ozone] JyotinderSingh opened a new pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

JyotinderSingh opened a new pull request #2991:
URL: https://github.com/apache/ozone/pull/2991


   ## What changes were proposed in this pull request?
   
   `TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM` seems to be intermittently failing frequently on both CI and local setups.
   
   Attached logs archive from failed CI run.
   https://github.com/apache/ozone/runs/4821162701?check_suite_focus=true
   
   Resolution:
   KeyDeletingService can sometimes take more than a few milliseconds to pick up the keys queued up for deletion. If the assertion runs before this is allowed to happen it can cause a test failure.
   We have added an artificial sleep wait before running the assertion to circumvent this.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6183
   
   ## How was this patch tested?
   
   Unit Tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on pull request #2991:
URL: https://github.com/apache/ozone/pull/2991#issuecomment-1013837338


   Thanks @JyotinderSingh for the patch.  Since it is a unit test-only change, the intermittent failure in integration test is unrelated.  I don't think it's worth retrying further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JyotinderSingh commented on a change in pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

Posted by GitBox <gi...@apache.org>.
JyotinderSingh commented on a change in pull request #2991:
URL: https://github.com/apache/ozone/pull/2991#discussion_r785323523



##########
File path: hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java
##########
@@ -139,8 +143,22 @@ public void checkIfDeleteServiceWithFailingSCM()
     createAndDeleteKeys(keyManager, keyCount, 1);
     KeyDeletingService keyDeletingService =
         (KeyDeletingService) keyManager.getDeletingService();
-    Assert.assertEquals(
-        keyManager.getPendingDeletionKeys(Integer.MAX_VALUE).size(), keyCount);
+    GenericTestUtils.waitFor(
+        () -> {
+          try {
+            int numPendingDeletionKeys =
+                keyManager.getPendingDeletionKeys(Integer.MAX_VALUE).size();
+            if (numPendingDeletionKeys != keyCount) {
+              LOG.error("Expected {} keys to be pending deletion, but got {}",
+                  keyCount, numPendingDeletionKeys);
+              return false;
+            }
+            return true;
+          } catch (IOException e) {
+            LOG.error("Error while getting pending deletion keys.");

Review comment:
       Done ✅ 

##########
File path: hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java
##########
@@ -139,8 +143,22 @@ public void checkIfDeleteServiceWithFailingSCM()
     createAndDeleteKeys(keyManager, keyCount, 1);
     KeyDeletingService keyDeletingService =
         (KeyDeletingService) keyManager.getDeletingService();
-    Assert.assertEquals(
-        keyManager.getPendingDeletionKeys(Integer.MAX_VALUE).size(), keyCount);
+    GenericTestUtils.waitFor(
+        () -> {
+          try {
+            int numPendingDeletionKeys =
+                keyManager.getPendingDeletionKeys(Integer.MAX_VALUE).size();
+            if (numPendingDeletionKeys != keyCount) {
+              LOG.error("Expected {} keys to be pending deletion, but got {}",

Review comment:
       Done ✅ 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai merged pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

Posted by GitBox <gi...@apache.org>.
adoroszlai merged pull request #2991:
URL: https://github.com/apache/ozone/pull/2991


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JyotinderSingh commented on a change in pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

Posted by GitBox <gi...@apache.org>.
JyotinderSingh commented on a change in pull request #2991:
URL: https://github.com/apache/ozone/pull/2991#discussion_r785306753



##########
File path: hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java
##########
@@ -137,6 +137,10 @@ public void checkIfDeleteServiceWithFailingSCM()
 
     final int keyCount = 100;
     createAndDeleteKeys(keyManager, keyCount, 1);
+    // Wait for the KeyDeletingService to pick up all the keys.
+    // There isn't any time-consuming code after the deleteKey call - which can
+    // cause assertion failures.
+    Thread.sleep(2000);

Review comment:
       Thanks for the correction @adoroszlai, I have fixed this in my latest patch




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2991:
URL: https://github.com/apache/ozone/pull/2991#discussion_r785320155



##########
File path: hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java
##########
@@ -139,8 +143,22 @@ public void checkIfDeleteServiceWithFailingSCM()
     createAndDeleteKeys(keyManager, keyCount, 1);
     KeyDeletingService keyDeletingService =
         (KeyDeletingService) keyManager.getDeletingService();
-    Assert.assertEquals(
-        keyManager.getPendingDeletionKeys(Integer.MAX_VALUE).size(), keyCount);
+    GenericTestUtils.waitFor(
+        () -> {
+          try {
+            int numPendingDeletionKeys =
+                keyManager.getPendingDeletionKeys(Integer.MAX_VALUE).size();
+            if (numPendingDeletionKeys != keyCount) {
+              LOG.error("Expected {} keys to be pending deletion, but got {}",
+                  keyCount, numPendingDeletionKeys);
+              return false;
+            }
+            return true;
+          } catch (IOException e) {
+            LOG.error("Error while getting pending deletion keys.");

Review comment:
       Logging the exception may be helpful to debug test failures.
   
   ```suggestion
               LOG.error("Error while getting pending deletion keys.", e);
   ```

##########
File path: hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java
##########
@@ -139,8 +143,22 @@ public void checkIfDeleteServiceWithFailingSCM()
     createAndDeleteKeys(keyManager, keyCount, 1);
     KeyDeletingService keyDeletingService =
         (KeyDeletingService) keyManager.getDeletingService();
-    Assert.assertEquals(
-        keyManager.getPendingDeletionKeys(Integer.MAX_VALUE).size(), keyCount);
+    GenericTestUtils.waitFor(
+        () -> {
+          try {
+            int numPendingDeletionKeys =
+                keyManager.getPendingDeletionKeys(Integer.MAX_VALUE).size();
+            if (numPendingDeletionKeys != keyCount) {
+              LOG.error("Expected {} keys to be pending deletion, but got {}",

Review comment:
       I don't think a small delay is an error here.
   
   ```suggestion
                 LOG.info("Expected {} keys to be pending deletion, but got {}",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JyotinderSingh commented on pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

Posted by GitBox <gi...@apache.org>.
JyotinderSingh commented on pull request #2991:
URL: https://github.com/apache/ozone/pull/2991#issuecomment-1013841523


   > Thanks @JyotinderSingh for the patch.  Since it is a unit test-only change, the intermittent failure in integration test is unrelated.  I don't think it's worth retrying further.
   
   Yes, that makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JyotinderSingh commented on a change in pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

Posted by GitBox <gi...@apache.org>.
JyotinderSingh commented on a change in pull request #2991:
URL: https://github.com/apache/ozone/pull/2991#discussion_r785306753



##########
File path: hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java
##########
@@ -137,6 +137,10 @@ public void checkIfDeleteServiceWithFailingSCM()
 
     final int keyCount = 100;
     createAndDeleteKeys(keyManager, keyCount, 1);
+    // Wait for the KeyDeletingService to pick up all the keys.
+    // There isn't any time-consuming code after the deleteKey call - which can
+    // cause assertion failures.
+    Thread.sleep(2000);

Review comment:
       Thanks for the correction @adoroszlai, I have fixed this is in my latest patch




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2991: HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2991:
URL: https://github.com/apache/ozone/pull/2991#discussion_r785303916



##########
File path: hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java
##########
@@ -137,6 +137,10 @@ public void checkIfDeleteServiceWithFailingSCM()
 
     final int keyCount = 100;
     createAndDeleteKeys(keyManager, keyCount, 1);
+    // Wait for the KeyDeletingService to pick up all the keys.
+    // There isn't any time-consuming code after the deleteKey call - which can
+    // cause assertion failures.
+    Thread.sleep(2000);

Review comment:
       Instead of adding fix sleep, it is better to check repeatedly with `GenericTestUtils.waitFor`, which lets the test pass sooner if the condition is met.  This allows increasing max. wait time without affecting usual run time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org