You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/18 08:39:13 UTC

[GitHub] [flink-kubernetes-operator] bgeng777 edited a comment on pull request #62: [FLINK-26605] Check if JM can serve rest api calls every time before reconcile

bgeng777 edited a comment on pull request #62:
URL: https://github.com/apache/flink-kubernetes-operator/pull/62#issuecomment-1072102050


   Hi @wangyang0918 @gyfora, thanks a lot for your reply.
   It's my bad to not pay enough attention to @tweise's recent [PR](https://github.com/apache/flink-kubernetes-operator/pull/56) for [FLINK-26473](https://issues.apache.org/jira/browse/FLINK-26473) as these 2 PR were developed at the same time. As Yang said, the goal of [FLINK-26605](https://issues.apache.org/jira/browse/FLINK-26605) is to solve 2 problems:
   1. Check if JobManager is really ready for serving when the state is READY.
   2. Solve the case that JobManager crashed backoff.
   I agree that #56 has already solved above problems.
   
   This PR is trying to solve above 2 problems in a somehow different way and leads to confusion. As Gyula's said, this PR should swicth the target to improve the session case. In fact, the session observer will keep switching between READY and DEPLOYED_NOT_READY due to logic in main branch which should be a bug I believe.
   
   So I would like to clarify the goal of this PR will include after above discussion:
   1. Refine the control flow in the `observeJmDeployment` based on current main branch to improve logic of moving from DEPLOYED_NOT_READY to READY to fix the bug of session observer.
   2. Introduce a new config `operator.observer.flink.client.timeout.sec` to control the waiting time of `listJobs`. 
   3. Modify the Dockerfile to output more meaningful message in the CI.
   
   What I would do to rollback extra changes:
   1. Remove `isJobManagerServing` method. Instead, I will keep using `isJobManagerPortReady` as well to detect if JM is ready to work in Session case. The reason is that the timeout solution for a rest call is expensive.
   2. Keep `operator.observer.rest-ready.delay.sec` as it is to follow our current logic of waiting some seconds and set the JM status to READY.
   
   I will finish above changes today. Please let me know if there are unwanted changes or I ignore some cases.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org