You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Liu (Jira)" <ji...@apache.org> on 2021/09/07 02:31:00 UTC
[jira] [Created] (FLINK-24174) MiniClusterTestEnvironment‘s triggerTaskManagerFailover may stuck in CommonTestUtils.waitForJobStatus()
Liu created FLINK-24174:
---------------------------
Summary: MiniClusterTestEnvironment‘s triggerTaskManagerFailover may stuck in CommonTestUtils.waitForJobStatus()
Key: FLINK-24174
URL: https://issues.apache.org/jira/browse/FLINK-24174
Project: Flink
Issue Type: Improvement
Components: Test Infrastructure
Reporter: Liu
When writing taskmanager failover tests with [unified testing framework for connectors|https://issues.apache.org/jira/browse/FLINK-19554], I find that it may stuck in
CommonTestUtils.waitForJobStatus() as following:
# triggerTaskManagerFailover is called.
# JobStatus switched from RUNNING to RESTARTING.
# JobStatus switched from RESTARTING to RUNNING.
# The method terminateTaskManager() is completed.
# Since the jobStatus is RUNNING, CommonTestUtils.waitForJobStatus() will never exit.
A solution is to call terminateTaskManager() with async way. At the same time, call
CommonTestUtils.waitForJobStatus(). The pseudo code can be as follow:
{code:java}
public void triggerTaskManagerFailover(JobClient jobClient, Runnable afterFailAction)
throws Exception {
CompletableFuture<Void> completableFuture = terminateTaskManager();
CommonTestUtils.waitForJobStatus(
jobClient,
Arrays.asList(JobStatus.FAILING, JobStatus.FAILED, JobStatus.RESTARTING),
Deadline.fromNow(Duration.ofMinutes(5)));
completableFuture.get();
afterFailAction.run();
startTaskManager();
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)