You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2021/01/15 10:40:00 UTC

[jira] [Commented] (FLINK-20138) Flink Job can not recover due to timeout of requiring slots when flink jobmanager restarted

    [ https://issues.apache.org/jira/browse/FLINK-20138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265905#comment-17265905 ] 

Till Rohrmann commented on FLINK-20138:
---------------------------------------

Any updates [~1026688210]? Could you reproduce the problem and capture the debug logs?

> Flink Job can not recover due to  timeout of requiring slots when flink jobmanager restarted
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-20138
>                 URL: https://issues.apache.org/jira/browse/FLINK-20138
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Table SQL / Runtime
>         Environment: flink : 1.9.2
> hadoop :2.7.2
> jdk:1.8
>            Reporter: wgcn
>            Priority: Major
>         Attachments: 2820F7EE-85F9-441D-95D5-8163FB6267DF.png, jobmanager.log, zk_resource_address_info.png
>
>
> our flink jobs run on Yarn Perjob Mode. We stoped some nodemanger  machines  ,and   AMs of  the  machines  restarted at other nodemanager.  We found  some jobs  can not recover due to  timeout of requiring slots.
> *SlotPoolImp always did not connect ResourceManager *
> ```
> 2020-11-09 16:31:31,794                           INFO flink-akka.actor.default-dispatcher-16 (org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl.stashRequestWaitingForResourceManager:369) - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{456c9daa6670a4490810f8e51f495174}]
> ```
> *1.We did not find  the log of YarnResourceManager requesting container   at the jobmanager log of attachment. 
> 2.The node  of Zookeeper is also  showed at attachment .*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)