You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2019/05/07 15:28:00 UTC
[jira] [Resolved] (FLINK-12219) Yarn application can't stop when
flink job failed in per-job yarn cluster mode
[ https://issues.apache.org/jira/browse/FLINK-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann resolved FLINK-12219.
-----------------------------------
Resolution: Fixed
Fix Version/s: 1.8.1
1.9.0
1.7.3
Fixed via
1.9.0: 417d6d2070e7ff82eb73a605f12f50ca13acce15
1.8.1: a956a49876bb1733bb9354372c25c05d0d96d7be
1.7.3: cfbac8f024e818e9b9f816d93a684c4f8b721c2a
> Yarn application can't stop when flink job failed in per-job yarn cluster mode
> ------------------------------------------------------------------------------
>
> Key: FLINK-12219
> URL: https://issues.apache.org/jira/browse/FLINK-12219
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN, Runtime / REST
> Affects Versions: 1.6.3, 1.8.0
> Reporter: lamber-ken
> Assignee: lamber-ken
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
> Attachments: fix-bug.patch, image-2019-04-17-15-00-40-687.png, image-2019-04-17-15-02-49-513.png, image-2019-04-23-17-37-00-081.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> h3. *Issue detail info*
> In our flink(1.6.3) product env, I often encounter a scene that yarn application can't stop when flink job failed in per-job yarn cluste mode, so I deeply analyzed the reason why it happened.
> When a flink job fail, system will write an archive file to a FileSystem through +MiniDispatcher#archiveExecutionGraph+ method, then notify YarnJobClusterEntrypoint to shutDown. But, if +MiniDispatcher#archiveExecutionGraph+ throw exceptions during execution, it affect the following calls.
> So I open [FLINK-12247|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-12247] to solve NEP bug when system write archive to FileSystem. But We still need to consider other exceptions, so we should catch Exception / Throwable not just IOExcetion.
> h3. *Flink yarn job fail flow*
> !image-2019-04-23-17-37-00-081.png!
> h3. *Flink yarn job fail on yarn*
> !image-2019-04-17-15-00-40-687.png!
>
> h3. *Flink yarn application can't stop*
> !image-2019-04-17-15-02-49-513.png!
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)