You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/04/01 20:17:47 UTC

[GitHub] [hadoop-ozone] smengcl opened a new pull request #750: HDDS-3309. Add timeout to freon integration tests

smengcl opened a new pull request #750: HDDS-3309. Add timeout to freon integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750
 
 
   ## What changes were proposed in this pull request?
   
   Add timeout to freon integration tests.
   
   This helps ruling out flaky long-running tests (e.g. `TestRandomKeyGenerator.bigFileThan2GB`) that are taking a very long time to run in Github actions.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3309
   
   ## How was this patch tested?
   
   Running.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-610051463
 
 
   > @smengcl `it-freon` is currently disabled globally, so any change to those tests is not being validated, ie. whether they work OK with the new timeouts or not. I think we should avoid changing them, to avoid unexpected failures later when enabling them.
   > 
   > `forkedProcessTimeoutInSeconds` applies to each test class. Test output is available in the artifact (eg. it-client.zip), although test results are not.
   > 
   > I think it would be more useful to change tests that time out with the current limit to use `GenericTestUtils.waitFor`, which prints a thread dump in case of timeout.
   
   Hmm I see `it-freon` is running in master branch? e.g. https://github.com/apache/hadoop-ozone/runs/564223039
   
   I can remove the modifications to the ignored freon tests. i.e. `TestOzoneClientKeyGenerator` and `TestRandomKeyGenerator`.
   
   I agree `GenericTestUtils.waitFor` is useful, but on a lower level. For example when we pinpointed the timeout to exact lines like the last assertion in `TestRandomKeyGenerator.bigFileThan2GB`. The goal of this jira however is just to add a class-global timeout. This should achieve the goal of collecting logs and displaying more friendly output.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] adoroszlai commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-610319545
 
 
   > Hmm I see `it-freon` is running in master branch? e.g. https://github.com/apache/hadoop-ozone/runs/564223039
   
   Yes, it seems a66aae8613c20171fa78cfa434282bcaf28691ed only disabled it for PRs, not for post-commit build.  Thanks for the info.
   
   > I can remove the modifications to the ignored freon tests. i.e. TestOzoneClientKeyGenerator and TestRandomKeyGenerator.
   
   Thanks.  What about `TestDataValidate*` and `TestFreonWith*`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl edited a comment on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl edited a comment on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-608409107
 
 
   > Thanks the patch @smengcl This timeouts (and especially the root cause) the biggest problem with the integration tests right now...
   > 
   > > This helps ruling out flaky long-running tests (e.g. TestRandomKeyGenerator.bigFileThan2GB) that are taking a very long time to run in Github actions.
   > 
   > Did you see any long running integration tests?
   > 
   > It seems that we have a global timeout:
   > 
   > In the main `pom.xml`:
   > 
   > ```
   >   <surefire.fork.timeout>900</surefire.fork.timeout>
   > ...
   >   <forkedProcessTimeoutInSeconds>${surefire.fork.timeout}</forkedProcessTimeoutInSeconds>
   > ```
   > 
   > Isn't it easier to decrease this number? (If I understand well it does the same as we fork the JVM)
   > 
   > Did you check the current execution time of all the integration tests? Is the proposed value is significant bigger than the expected time of the slowest test?
   
   Hey Marton, thanks for the comment.
   
   `TestRandomKeyGenerator.bigFileThan2GB` failed in [this](https://github.com/apache/hadoop-ozone/runs/540098578) run. This isn't really a timeout, just flaky. I should change the description into another one.
   ```
   [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 333.028 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
   [ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator)  Time elapsed: 267.469 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1> but was:<0>
   	at org.junit.Assert.fail(Assert.java:88)
   ```
   
   But `TestOzoneRpcClient` seems like a problem [here](https://github.com/apache/hadoop-ozone/runs/540098466). When it fails, it doesn't seem to give any useful information in the log. And it is running for way too long:
   ```
   [INFO] Apache Hadoop Ozone Integration Tests .............. FAILURE [53:03 min]
   [INFO] Apache Hadoop Ozone Mini Ozone Chaos Tests ......... SKIPPED
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  53:04 min
   [INFO] Finished at: 2020-03-27T18:32:50Z
   [INFO] ------------------------------------------------------------------------
   [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-ozone-integration-test: There was a timeout or other error in the fork -> [Help 1]
   [ERROR] 
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR] 
   [ERROR] For more information about the errors and possible solutions, please read the following articles:
   [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient
   ```
   
   [Here's](https://github.com/apache/hadoop-ozone/pull/726/checks?check_run_id=537696913) another one with `TestBlockOutputStreamWithFailures`, `it-client` running for 39:48 min and reported failure. And [this](https://github.com/apache/hadoop-ozone/runs/557875782) one, also `TestBlockOutputStreamWithFailures` for 26:06 min, on the master branch.
   
   I'm not sure if those issues are already fixed or not. Anyway the idea of this jira is to add a class-global timeout that applies to EACH test functions in those test classes. Does `forkedProcessTimeoutInSeconds` achieve the same thing?
   
   If it does, from the above `TestOzoneRpcClient` example it doesn't seem to work well either, as the whole it-client tests were running for ~53min = 3180sec.
   
   It seems that as a result of the timeout, we are not getting useful logs to diagnose the flakiness. <- This is the main reason @arp7 asks me to add the timeout to all the tests.
   
   We might want to further lower the timeout of some known flaky (timing out intermittently) tests later.
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl edited a comment on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl edited a comment on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-608409107
 
 
   > Thanks the patch @smengcl This timeouts (and especially the root cause) the biggest problem with the integration tests right now...
   > 
   > > This helps ruling out flaky long-running tests (e.g. TestRandomKeyGenerator.bigFileThan2GB) that are taking a very long time to run in Github actions.
   > 
   > Did you see any long running integration tests?
   > 
   > It seems that we have a global timeout:
   > 
   > In the main `pom.xml`:
   > 
   > ```
   >   <surefire.fork.timeout>900</surefire.fork.timeout>
   > ...
   >   <forkedProcessTimeoutInSeconds>${surefire.fork.timeout}</forkedProcessTimeoutInSeconds>
   > ```
   > 
   > Isn't it easier to decrease this number? (If I understand well it does the same as we fork the JVM)
   > 
   > Did you check the current execution time of all the integration tests? Is the proposed value is significant bigger than the expected time of the slowest test?
   
   Hey Marton, thanks for the comment.
   
   `TestRandomKeyGenerator.bigFileThan2GB` failed in [this](https://github.com/apache/hadoop-ozone/runs/540098578) run. This isn't really a timeout, just flaky. I should change the description into another one.
   ```
   [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 333.028 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
   [ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator)  Time elapsed: 267.469 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1> but was:<0>
   	at org.junit.Assert.fail(Assert.java:88)
   ```
   
   But `TestOzoneRpcClient` seems like a problem [here](https://github.com/apache/hadoop-ozone/runs/540098466). When it fails, it doesn't seem to give any useful information in the log. And it is running for way too long:
   ```
   [INFO] Apache Hadoop Ozone Integration Tests .............. FAILURE [53:03 min]
   [INFO] Apache Hadoop Ozone Mini Ozone Chaos Tests ......... SKIPPED
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  53:04 min
   [INFO] Finished at: 2020-03-27T18:32:50Z
   [INFO] ------------------------------------------------------------------------
   [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-ozone-integration-test: There was a timeout or other error in the fork -> [Help 1]
   [ERROR] 
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR] 
   [ERROR] For more information about the errors and possible solutions, please read the following articles:
   [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient
   ```
   
   [Here's](https://github.com/apache/hadoop-ozone/pull/726/checks?check_run_id=537696913) another one with `TestBlockOutputStreamWithFailures`, `it-client` running for 39:48 min and reported failure.
   
   I'm not sure if those issues are already fixed or not. Anyway the idea of this jira is to add a class-global timeout that applies to EACH test functions in those test classes. Does `forkedProcessTimeoutInSeconds` achieve the same thing?
   
   If it does, from the above `TestOzoneRpcClient` example it doesn't seem to work well either, as the whole it-client tests were running for ~53min = 3180sec.
   
   It seems that as a result of the timeout, we are not getting useful logs to diagnose the flakiness. <- This is the main reason @arp7 asks me to add the timeout to all the tests.
   
   We might want to further lower the timeout of some known flaky (timing out intermittently) tests later.
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl closed pull request #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl closed pull request #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-610648964
 
 
   > > Hmm I see `it-freon` is running in master branch? e.g. https://github.com/apache/hadoop-ozone/runs/564223039
   > 
   > Yes, it seems [a66aae8](https://github.com/apache/hadoop-ozone/commit/a66aae8613c20171fa78cfa434282bcaf28691ed) only disabled it for PRs, not for post-commit build. Thanks for the info.
   > 
   > > I can remove the modifications to the ignored freon tests. i.e. TestOzoneClientKeyGenerator and TestRandomKeyGenerator.
   > 
   > Thanks. What about `TestDataValidate*` and `TestFreonWith*`?
   
   Cool.
   
   Let's keep the modification to `TestDataValidate*` and `TestFreonWith*` then? As they are still being run in post-commit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-608409107
 
 
   > Thanks the patch @smengcl This timeouts (and especially the root cause) the biggest problem with the integration tests right now...
   > 
   > > This helps ruling out flaky long-running tests (e.g. TestRandomKeyGenerator.bigFileThan2GB) that are taking a very long time to run in Github actions.
   > 
   > Did you see any long running integration tests?
   > 
   > It seems that we have a global timeout:
   > 
   > In the main `pom.xml`:
   > 
   > ```
   >   <surefire.fork.timeout>900</surefire.fork.timeout>
   > ...
   >   <forkedProcessTimeoutInSeconds>${surefire.fork.timeout}</forkedProcessTimeoutInSeconds>
   > ```
   > 
   > Isn't it easier to decrease this number? (If I understand well it does the same as we fork the JVM)
   > 
   > Did you check the current execution time of all the integration tests? Is the proposed value is significant bigger than the expected time of the slowest test?
   
   Hey Marton, thanks for the comment.
   
   `TestRandomKeyGenerator.bigFileThan2GB` failed in [this](https://github.com/apache/hadoop-ozone/runs/540098578) run. This isn't really a timeout, just flaky. I should change the description into another one.
   ```
   [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 333.028 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
   [ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator)  Time elapsed: 267.469 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1> but was:<0>
   	at org.junit.Assert.fail(Assert.java:88)
   ```
   
   But `TestOzoneRpcClient` seems like a problem [here](https://github.com/apache/hadoop-ozone/runs/540098466). When it fails, it doesn't seem to give any useful information in the log. And it is running for way too long:
   ```
   [INFO] Apache Hadoop Ozone Integration Tests .............. FAILURE [53:03 min]
   [INFO] Apache Hadoop Ozone Mini Ozone Chaos Tests ......... SKIPPED
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  53:04 min
   [INFO] Finished at: 2020-03-27T18:32:50Z
   [INFO] ------------------------------------------------------------------------
   [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-ozone-integration-test: There was a timeout or other error in the fork -> [Help 1]
   [ERROR] 
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR] 
   [ERROR] For more information about the errors and possible solutions, please read the following articles:
   [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient
   ```
   
   I'm not sure if those issues are already fixed or not. Anyway the idea of this jira is to add a class-global timeout to EACH of the test functions in those classes. Does `forkedProcessTimeoutInSeconds` achieve the same thing?
   If it does, from the above `TestOzoneRpcClient` example it doesn't seem to work well either, as the whole it-client tests were running for ~53min = 3180sec.
   It seems that as a result of the timeout, we are not getting useful logs to diagnose the flakiness. <- This is the main reason @arp7 asks me to add the timeout to all the tests.
   
   We might want to further lower the timeout of some known flaky (timing out intermittently) tests later.
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-612714073
 
 
   I will open a new PR for `TimedOutTestsListener` for the sake of clarity.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
elek commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-610937389
 
 
   Sorry for late comment, but today I learned that Hadoop has a [TimedOutTestsListener](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java).
   
   It's configured in the `pom.xml files`, for example here: https://github.com/apache/hadoop/blob/1189af4746919774035f5d64ccb4d2ce21905aaa/hadoop-hdfs-project/hadoop-hdfs/pom.xml#L236
   
   Wouldn't it be more effective to use a similar listener? (If yes, I would prefer to fork it instead of adding one more Hadoop dependency, especially after HDDS-3353 and HDDS-3312).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl edited a comment on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl edited a comment on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-608409107
 
 
   > Thanks the patch @smengcl This timeouts (and especially the root cause) the biggest problem with the integration tests right now...
   > 
   > > This helps ruling out flaky long-running tests (e.g. TestRandomKeyGenerator.bigFileThan2GB) that are taking a very long time to run in Github actions.
   > 
   > Did you see any long running integration tests?
   > 
   > It seems that we have a global timeout:
   > 
   > In the main `pom.xml`:
   > 
   > ```
   >   <surefire.fork.timeout>900</surefire.fork.timeout>
   > ...
   >   <forkedProcessTimeoutInSeconds>${surefire.fork.timeout}</forkedProcessTimeoutInSeconds>
   > ```
   > 
   > Isn't it easier to decrease this number? (If I understand well it does the same as we fork the JVM)
   > 
   > Did you check the current execution time of all the integration tests? Is the proposed value is significant bigger than the expected time of the slowest test?
   
   Hey Marton, thanks for the comment.
   
   `TestRandomKeyGenerator.bigFileThan2GB` failed in [this](https://github.com/apache/hadoop-ozone/runs/540098578) run. This isn't really a timeout, just flaky. I should change the description into another one.
   ```
   [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 333.028 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
   [ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator)  Time elapsed: 267.469 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1> but was:<0>
   	at org.junit.Assert.fail(Assert.java:88)
   ```
   
   But `TestOzoneRpcClient` seems like a problem [here](https://github.com/apache/hadoop-ozone/runs/540098466). When it fails, it doesn't seem to give any useful information in the log. And it is running for way too long:
   ```
   [INFO] Apache Hadoop Ozone Integration Tests .............. FAILURE [53:03 min]
   [INFO] Apache Hadoop Ozone Mini Ozone Chaos Tests ......... SKIPPED
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  53:04 min
   [INFO] Finished at: 2020-03-27T18:32:50Z
   [INFO] ------------------------------------------------------------------------
   [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-ozone-integration-test: There was a timeout or other error in the fork -> [Help 1]
   [ERROR] 
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR] 
   [ERROR] For more information about the errors and possible solutions, please read the following articles:
   [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient
   ```
   
   I'm not sure if those issues are already fixed or not. Anyway the idea of this jira is to add a class-global timeout that applies to EACH test functions in those test classes. Does `forkedProcessTimeoutInSeconds` achieve the same thing?
   If it does, from the above `TestOzoneRpcClient` example it doesn't seem to work well either, as the whole it-client tests were running for ~53min = 3180sec.
   It seems that as a result of the timeout, we are not getting useful logs to diagnose the flakiness. <- This is the main reason @arp7 asks me to add the timeout to all the tests.
   
   We might want to further lower the timeout of some known flaky (timing out intermittently) tests later.
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
smengcl commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-611802367
 
 
   > Sorry for late comment, but today I learned that Hadoop has a [TimedOutTestsListener](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/TimedOutTestsListener.java).
   > 
   > It's configured in the `pom.xml files`, for example here: https://github.com/apache/hadoop/blob/1189af4746919774035f5d64ccb4d2ce21905aaa/hadoop-hdfs-project/hadoop-hdfs/pom.xml#L236
   > 
   > Wouldn't it be more effective to use a similar listener? (If yes, I would prefer to fork it instead of adding one more Hadoop dependency, especially after HDDS-3353 and HDDS-3312).
   
   Good find. I'll take a look into that.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
elek commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-608398011
 
 
   Thanks the patch @smengcl This timeouts (and especially the root cause) the biggest problem with the integration tests right now... 
   
   > This helps ruling out flaky long-running tests (e.g. TestRandomKeyGenerator.bigFileThan2GB) that are taking a very long time to run in Github actions.
   
   Did you see any long running integration tests?
   
   It seems that we have a global timeout:
   
   In the main `pom.xml`:
   ```
     <surefire.fork.timeout>900</surefire.fork.timeout>
   ...
     <forkedProcessTimeoutInSeconds>${surefire.fork.timeout}</forkedProcessTimeoutInSeconds>
   ```
   
   Isn't it easier to decrease this number? (If I understand well it does the same as we fork the JVM)
   
   Did you check the current execution time of all the integration tests? Is the proposed value is significant bigger than the expected time of the slowest test?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on issue #750: HDDS-3309. Add timeout to all integration tests

Posted by GitBox <gi...@apache.org>.
elek commented on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-609710694
 
 
   > I'm not sure if those issues are already fixed or not. Anyway the idea of this jira is to add a class-global timeout that applies to EACH test functions in those test classes.
   
   Thanks to explain it. So with this approach we will have the name of the problematic method instead of having only the name of the test class. Very nice.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org