You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/04/03 12:37:18 UTC

[GitHub] [hadoop-ozone] smengcl edited a comment on issue #750: HDDS-3309. Add timeout to all integration tests

smengcl edited a comment on issue #750: HDDS-3309. Add timeout to all integration tests
URL: https://github.com/apache/hadoop-ozone/pull/750#issuecomment-608409107
 
 
   > Thanks the patch @smengcl This timeouts (and especially the root cause) the biggest problem with the integration tests right now...
   > 
   > > This helps ruling out flaky long-running tests (e.g. TestRandomKeyGenerator.bigFileThan2GB) that are taking a very long time to run in Github actions.
   > 
   > Did you see any long running integration tests?
   > 
   > It seems that we have a global timeout:
   > 
   > In the main `pom.xml`:
   > 
   > ```
   >   <surefire.fork.timeout>900</surefire.fork.timeout>
   > ...
   >   <forkedProcessTimeoutInSeconds>${surefire.fork.timeout}</forkedProcessTimeoutInSeconds>
   > ```
   > 
   > Isn't it easier to decrease this number? (If I understand well it does the same as we fork the JVM)
   > 
   > Did you check the current execution time of all the integration tests? Is the proposed value is significant bigger than the expected time of the slowest test?
   
   Hey Marton, thanks for the comment.
   
   `TestRandomKeyGenerator.bigFileThan2GB` failed in [this](https://github.com/apache/hadoop-ozone/runs/540098578) run. This isn't really a timeout, just flaky. I should change the description into another one.
   ```
   [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 333.028 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
   [ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator)  Time elapsed: 267.469 s  <<< FAILURE!
   java.lang.AssertionError: expected:<1> but was:<0>
   	at org.junit.Assert.fail(Assert.java:88)
   ```
   
   But `TestOzoneRpcClient` seems like a problem [here](https://github.com/apache/hadoop-ozone/runs/540098466). When it fails, it doesn't seem to give any useful information in the log. And it is running for way too long:
   ```
   [INFO] Apache Hadoop Ozone Integration Tests .............. FAILURE [53:03 min]
   [INFO] Apache Hadoop Ozone Mini Ozone Chaos Tests ......... SKIPPED
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  53:04 min
   [INFO] Finished at: 2020-03-27T18:32:50Z
   [INFO] ------------------------------------------------------------------------
   [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-ozone-integration-test: There was a timeout or other error in the fork -> [Help 1]
   [ERROR] 
   [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR] 
   [ERROR] For more information about the errors and possible solutions, please read the following articles:
   [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
   org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient
   ```
   
   I'm not sure if those issues are already fixed or not. Anyway the idea of this jira is to add a class-global timeout that applies to EACH test functions in those test classes. Does `forkedProcessTimeoutInSeconds` achieve the same thing?
   If it does, from the above `TestOzoneRpcClient` example it doesn't seem to work well either, as the whole it-client tests were running for ~53min = 3180sec.
   It seems that as a result of the timeout, we are not getting useful logs to diagnose the flakiness. <- This is the main reason @arp7 asks me to add the timeout to all the tests.
   
   We might want to further lower the timeout of some known flaky (timing out intermittently) tests later.
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org