You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2021/01/19 17:02:00 UTC

[jira] [Comment Edited] (FLINK-21029) Failure of shutdown lead to restart of (connected) pipeline

    [ https://issues.apache.org/jira/browse/FLINK-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268034#comment-17268034 ] 

Till Rohrmann edited comment on FLINK-21029 at 1/19/21, 5:01 PM:
-----------------------------------------------------------------

Yes, the problem with simply stopping might be that you stopped your job w/o a savepoint meaning that you might have lost state because the checkpoints should be cleaned up at this point. Hence, the current idea was to fail the operation, tell the user and try to recover. The last step might succeed or fail leading to a {{FAILED}} job.

For the case where the user provided a wrong savepoint path with the stop command, I think we should fail the operation but not the job. The wrong savepoint path could be simple typo by the user.

I think you are right that the restart behaviour is not documented [here|https://ci.apache.org/projects/flink/flink-docs-stable/deployment/cli.html]. I think this is the least we should do in order to resolve this ticket.


was (Author: till.rohrmann):
Yes, the problem with simply stopping might be that you stopped your job w/o a savepoint meaning that you might have lost because the checkpoints should be cleaned up at this point. Hence, the current idea was to fail the operation, tell the user and try to recover. The last step might succeed or fail leading to a {{FAILED}} job.

For the case where the user provided a wrong savepoint path with the stop command, I think we should fail the operation but not the job. The wrong savepoint path could be simple typo by the user.

I think you are right that the restart behaviour is not documented [here|https://ci.apache.org/projects/flink/flink-docs-stable/deployment/cli.html]. I think this is the least we should do in order to resolve this ticket.

> Failure of shutdown lead to restart of (connected) pipeline
> -----------------------------------------------------------
>
>                 Key: FLINK-21029
>                 URL: https://issues.apache.org/jira/browse/FLINK-21029
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.2
>            Reporter: Theo Diefenthal
>            Priority: Major
>             Fix For: 1.13.0, 1.11.4, 1.12.2
>
>
> This bug happened in combination with https://issues.apache.org/jira/browse/FLINK-21028 .
> When I wanted to stop a job via CLI "flink stop..." with disjoint job graph (independent pipelines in the graph), one task wan't able to stop properly (Reported in mentioned bug). This lead to restarting the job. I think, this is a wrong behavior in general and a separated bug:
> If any crash occurs on (trying) to stop a job, Flink shouldn't try to restart but continue stopping the job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)