You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/23 08:08:00 UTC

[GitHub] [flink-kubernetes-operator] wangyang0918 opened a new pull request #101: [FLINK-26768] Clear errors when observed running job successfully

wangyang0918 opened a new pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101


   We change the `JobReconciler#reconcile()` a bit in this PR. Even though, the spec does not change, we still have a chance to update the reconciliationStatus when observed running job successfully.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] wangyang0918 commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076129049


   > At the moment we use the reconciliationStatus/error to store both errors related to the reconciliation and when we dedect errors with the deployment.
   > 
   > So it is a bit overused field actually which causes some complications now :)
   
   I agree with you about this. So do you think we need to separate the two different errors? One is related to the reconciliation and another is about detecting the JobManager deployment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
gyfora commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076110926


   At the moment we use the reconciliationStatus/error to store both errors related to the reconciliation and when we dedect errors with the deployment.
   
   So it is a bit overused field actually which causes some complications now :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] wangyang0918 edited a comment on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
wangyang0918 edited a comment on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076103980


   Of cause, the observer could also clear the error. The reason why I put this logic in the reconciler is that observer should not update the `reconciliationStatus`. It is out of the scope of observer's responsibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] wangyang0918 commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076074575


   So you are suggesting that we could simply clear the error no matter we have observed running job or not. Right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] wangyang0918 commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076164488


   > I would not rush with separating this because it's not always clear what kind of error it is. In most cases the deployment error is actually related to the reconciliation (we submitted an invalid image or service account config or something).
   > 
   > I think the best thing we could do is make sure errors are cleared in a consistent way and later we can introduce new fields if we see the need.
   > 
   > What do you think about this flow:
   > 
   > 1. Observer observes deployment
   > 2. If we are not in JMDeployment.ERROR state clear the error information
   > 3. The reconciler will set the error again if necessery (that is how it already works)
   > 
   > This might cause a back and forth setting of error states in cases where the job is running but there is an invalid reconcile request but this happens within the controller loop so the user will not see this.
   
   Nice suggestion and comments have been addressed accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
gyfora commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076076179


   You are right we should actually not clear errors if the JobDeployment is in ERROR state. So maybe the Observer already could clear errors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora merged pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
gyfora merged pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
gyfora commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076134910


   I would not rush with separating this because it's not always clear what kind of error it is. In most cases the deployment error is actually related to the reconciliation (we submitted an invalid image or service account config or something).
   
   I think the best thing we could do is make sure errors are cleared in a consistent way and later we can introduce new fields if we see the need.
   
   What do you think about this flow:
   1. Observer observes deployment
   2. If we are not in JMDeployment.ERROR state clear the error information
   3. The reconciler will reset error if necessery
   
   This might cause a back and forth setting of error states in cases where the job is running but there is an invalid reconcile request but this happens within the controller loop so the user will not see this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] wangyang0918 commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
wangyang0918 commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076103980


   Of cause, the observer could also clear the error. The reason why I put this logic in the reconciler is that observer should not update the `reconciliationStatus`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora edited a comment on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
gyfora edited a comment on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076134910






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
gyfora commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076064836


   @wangyang0918 I wonder if the whole logic could be simplified as follows:
   
   if after the reconcile step, we did not throw an exception and there is still an error, clear the error. This way this could be simple 2 lines in the controller logic after reconcile.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
gyfora commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076079523


   I am not completely sure what I am suggesting but it feels strange to have this logic inside the reconciler. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #101: [FLINK-26768] Clear errors when observed running job successfully

Posted by GitBox <gi...@apache.org>.
gyfora commented on pull request #101:
URL: https://github.com/apache/flink-kubernetes-operator/pull/101#issuecomment-1076234150


   Looks good, I will test this a bit later :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org