You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2022/03/22 08:56:00 UTC

[jira] [Comment Edited] (FLINK-26798) JobMaster.testJobFailureWhenTaskExecutorHeartbeatTimeout failed due to missing Execution

    [ https://issues.apache.org/jira/browse/FLINK-26798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510333#comment-17510333 ] 

Matthias Pohl edited comment on FLINK-26798 at 3/22/22, 8:55 AM:
-----------------------------------------------------------------

Looks like it's caused by [DefaultExecutionDeploymentReconciler:53|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/DefaultExecutionDeploymentReconciler.java#L53]: An {{Execution}} is expected to be deployed to the {{TaskManager}} from the {{JobMaster}}'s side but wasn't reported in the {{TaskManager}}'s heartbeat payload. ...which makes sense considering that the {{TestingHeartbeatService}}'s payload contains an empty list of executions (see [JobMasterTest:1774|https://github.com/apache/flink/blob/1e7d45d53b7ea7b9cfadf2e293ba790f3a9e90c3/flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/JobMasterTest.java#L1774]).


was (Author: mapohl):
Looks like it's caused by [DefaultExecutionDeploymentReconciler:53|https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/DefaultExecutionDeploymentReconciler.java#L53]: An {{Execution}} is expected to be deployed to the {{TaskManager}} from the {{JobMaster}}'s side but wasn't reported in the {{TaskManager}}'s heartbeat payload.

> JobMaster.testJobFailureWhenTaskExecutorHeartbeatTimeout failed due to missing Execution
> ----------------------------------------------------------------------------------------
>
>                 Key: FLINK-26798
>                 URL: https://issues.apache.org/jira/browse/FLINK-26798
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0, 1.16.0
>            Reporter: Matthias Pohl
>            Assignee: Matthias Pohl
>            Priority: Major
>              Labels: test-stability
>         Attachments: logs-ci_build-test_ci_build_finegrained_resource_management-1647897104.zip, test-failure.log, test-success.log
>
>
> [This build|https://dev.azure.com/mapohl/flink/_build/results?buildId=897&view=logs&j=cc649950-03e9-5fae-8326-2f1ad744b536&t=a9a20597-291c-5240-9913-a731d46d6dd1&l=8399] failed due to an {{ExecutionGraphException}} indicating that an expected {{Execution}} wasn't around:
> {code}
> [...]
> Caused by: org.apache.flink.util.FlinkException: Execution 48dbc880c8225256b8bc112ea36e9082 is unexpectedly no longer running on task executor bbad15fcb93d4b2b4f80fe2c35e03e6d.
>         at org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:250) ~[classes/:?]
>         ... 35 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)