You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yang Wang (Jira)" <ji...@apache.org> on 2020/04/09 07:16:00 UTC

[jira] [Commented] (FLINK-15642) Support to set JobManager readiness and liveness check

    [ https://issues.apache.org/jira/browse/FLINK-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078996#comment-17078996 ] 

Yang Wang commented on FLINK-15642:
-----------------------------------

[~felixzheng] The reason why i want to add this feature is to avoid the jobmanager hang and there is no response from it for too long time.

 

For YARN deployment, the YARN resourcemanager is responsible for the liveness of the jobmanager. When it does not heartbeat for a while(default is 600s), it will be killed and a new jobmanager will be started. So i am thinking to add this feature into K8s by using liveness check.

 

Also the readiness could help us to verify whether the session cluster is ready for accepting Flink jobs.  

> Support to set JobManager readiness and liveness check
> ------------------------------------------------------
>
>                 Key: FLINK-15642
>                 URL: https://issues.apache.org/jira/browse/FLINK-15642
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Deployment / Kubernetes
>            Reporter: Yang Wang
>            Priority: Major
>
> The liveness of TaskManager will be controlled by Flink Master. When it failed, timeout, a new pod will be started to replace. We need to add a liveness check for JobManager.
>  
> It just like what we could do in the yaml.
> {code:java}
> ...
>         livenessProbe:
>           tcpSocket:
>             port: 6123
>           initialDelaySeconds: 30
>           periodSeconds: 60
> ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)