You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (Jira)" <ji...@apache.org> on 2022/07/11 13:30:00 UTC

[jira] [Comment Edited] (FLINK-28319) test_ci tests times out after/during running org.apache.flink.test.streaming.experimental

    [ https://issues.apache.org/jira/browse/FLINK-28319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564989#comment-17564989 ] 

Chesnay Schepler edited comment on FLINK-28319 at 7/11/22 1:29 PM:
-------------------------------------------------------------------

[~martijnvisser] The test execution order isn't _really_ deterministic, because 1 (of 2 or 4) test threads can get stuck while the other one continues with the remaining tests.

 

From the thread dump we can see that the ResumeCheckpointManuallyITCase is stuck:
{code:java}
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.flink.test.util.TestUtils.waitUntilJobCanceled(TestUtils.java:174)
	at org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:351)
	at org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:314)
	at org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsZookeeper(ResumeCheckpointManuallyITCase.java:228) {code}


was (Author: zentol):
[~martijnvisser] The test execution order isn't _really_ deterministic, because 1 of (2) test threads can get stuck while the other one continues with the remaining tests.

 

From the thread dump we can see that the ResumeCheckpointManuallyITCase is stuck:
{code:java}
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.flink.test.util.TestUtils.waitUntilJobCanceled(TestUtils.java:174)
	at org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:351)
	at org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:314)
	at org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFSCheckpointsZookeeper(ResumeCheckpointManuallyITCase.java:228) {code}

> test_ci tests times out after/during running org.apache.flink.test.streaming.experimental
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-28319
>                 URL: https://issues.apache.org/jira/browse/FLINK-28319
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.15.2
>            Reporter: Martijn Visser
>            Priority: Major
>              Labels: test-stability
>
> {code:java}
> Jun 30 03:16:25 [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.109 s - in org.apache.flink.test.streaming.experimental.CollectITCase
> ==========================================================================================
> === WARNING: This task took already 95% of the available time budget of 237 minutes ===
> ==========================================================================================
> ==============================================================================
> The following Java processes are running (JPS)
> ==============================================================================
> 932 Launcher
> 20281 Jps
> 17930 surefirebooter3147893032508885212.jar
> ==============================================================================
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=37384&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=6280



--
This message was sent by Atlassian Jira
(v8.20.10#820010)