You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/05/25 16:26:49 UTC

[GitHub] [ozone] symious opened a new pull request, #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData

symious opened a new pull request, #3456:
URL: https://github.com/apache/ozone/pull/3456

   ## What changes were proposed in this pull request?
   
   Failed in the following test:
   
   https://github.com/apache/ozone/runs/6585903265
   
   The issue starts in testFileOperationsAndDelete and causes the error of all other following tests.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6801
   
   Please replace this section with the link to the Apache JIRA)
   
   ## How was this patch tested?
   
   Run test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3456:
URL: https://github.com/apache/ozone/pull/3456#discussion_r881958634


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerHAWithData.java:
##########
@@ -109,6 +109,7 @@ public void testFileOperationsAndDelete() throws Exception {
     testFileOperationsWithRecursive();
     testFileOperationsWithNonRecursive();
     testKeysDelete();
+    Thread.sleep(1000);

Review Comment:
   Can you please explain why sleep is needed?  What do we need to wait for?  Would it be possible to explicitly wait for that condition?
   
   https://github.com/apache/ozone/blob/5ed0e0a9b4c8355f4cde064cc605bc419dacd9c3/hadoop-hdds/test-utils/src/main/java/org/apache/ozone/test/GenericTestUtils.java#L192-L228



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on PR #3456:
URL: https://github.com/apache/ozone/pull/3456#issuecomment-1172043990

   > I think "the sleep" is not a good way to solve this flasky test.
   
   Thanks.  Let's close this until we come up with a better fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a diff in pull request #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData

Posted by GitBox <gi...@apache.org>.
symious commented on code in PR #3456:
URL: https://github.com/apache/ozone/pull/3456#discussion_r903604582


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerHAWithData.java:
##########
@@ -109,6 +109,7 @@ public void testFileOperationsAndDelete() throws Exception {
     testFileOperationsWithRecursive();
     testFileOperationsWithNonRecursive();
     testKeysDelete();
+    Thread.sleep(1000);

Review Comment:
   I think "the sleep" is not a good way to solve this flasky test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on a diff in pull request #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData

Posted by GitBox <gi...@apache.org>.
neils-dev commented on code in PR #3456:
URL: https://github.com/apache/ozone/pull/3456#discussion_r882163369


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerHAWithData.java:
##########
@@ -109,6 +109,7 @@ public void testFileOperationsAndDelete() throws Exception {
     testFileOperationsWithRecursive();
     testFileOperationsWithNonRecursive();
     testKeysDelete();
+    Thread.sleep(1000);

Review Comment:
   @symious, another util you can use to check a condition between intervals until a timeout is the `LamdaTestUtils.await`:
   ie.  `LambdaTestUtils.await(WAIT_TIMEOUT_MILLIS, 1000, () -> {...}`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a diff in pull request #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData

Posted by GitBox <gi...@apache.org>.
symious commented on code in PR #3456:
URL: https://github.com/apache/ozone/pull/3456#discussion_r903603990


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerHAWithData.java:
##########
@@ -109,6 +109,7 @@ public void testFileOperationsAndDelete() throws Exception {
     testFileOperationsWithRecursive();
     testFileOperationsWithNonRecursive();
     testKeysDelete();
+    Thread.sleep(1000);

Review Comment:
   I tried to solve the timeout issue caused by HDDS-6685. But the error is still there, only the error message is not "port already in use", but "java.lang.IllegalStateException: gap between start index 131 and first entry to append 218".
   
   I think it's incurred by ratis-2.3.0, seems in ratis-2.2.0, the error of loading of SNAPSHOT is handled differently, so with ratis-2.2.0, the error was solved but with ratis-2.3.0 it's incurring some new errors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai closed pull request #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData

Posted by GitBox <gi...@apache.org>.
adoroszlai closed pull request #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData
URL: https://github.com/apache/ozone/pull/3456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3456: HDDS-6801. Intermittent failure in TestOzoneManagerHAWithData

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3456:
URL: https://github.com/apache/ozone/pull/3456#discussion_r902521934


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerHAWithData.java:
##########
@@ -109,6 +109,7 @@ public void testFileOperationsAndDelete() throws Exception {
     testFileOperationsWithRecursive();
     testFileOperationsWithNonRecursive();
     testKeysDelete();
+    Thread.sleep(1000);

Review Comment:
   Thanks @symious for the reply, but this still does not explain why explicit wait for some condition is not possible instead of sleep.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org