You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/12/20 15:36:31 UTC

[GitHub] [flink-kubernetes-operator] mxm opened a new pull request, #492: Start stabilization period only after job goes into RUNNING

mxm opened a new pull request, #492:
URL: https://github.com/apache/flink-kubernetes-operator/pull/492

   We were tracking the stabilization period from the job start time which can greatly vary from the time the job goes into RUNNING state. This can result in no stabilization period at all leading to poor scaling decisions.
   
   Thus, we change the logic to use the update time in combination with a RUNNING job status.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on pull request #492: [FLINK-30463] Start stabilization period only after job goes into RUNNING

Posted by GitBox <gi...@apache.org>.
mxm commented on PR #492:
URL: https://github.com/apache/flink-kubernetes-operator/pull/492#issuecomment-1361331765

   Do you think there are any blockers merging this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #492: [FLINK-30463] Start stabilization period only after job goes into RUNNING

Posted by GitBox <gi...@apache.org>.
gyfora commented on code in PR #492:
URL: https://github.com/apache/flink-kubernetes-operator/pull/492#discussion_r1053501498


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/ScalingExecutor.java:
##########
@@ -121,13 +122,22 @@ public boolean scaleResource(
 
     private boolean stabilizationPeriodPassed(
             AbstractFlinkResource<?, ?> resource, Configuration conf) {
-        var now = clock.instant();
+        var jobStatus = resource.getStatus().getJobStatus();
+
+        if (!JobStatus.RUNNING.name().equals(jobStatus.getState())) {

Review Comment:
   This might be a redundant check because the `JobAutoscaler` class already runs this check before calling any collection, evaluation etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #492: [FLINK-30463] Start stabilization period only after job goes into RUNNING

Posted by GitBox <gi...@apache.org>.
mxm commented on code in PR #492:
URL: https://github.com/apache/flink-kubernetes-operator/pull/492#discussion_r1054407083


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/ScalingExecutor.java:
##########
@@ -121,13 +122,22 @@ public boolean scaleResource(
 
     private boolean stabilizationPeriodPassed(
             AbstractFlinkResource<?, ?> resource, Configuration conf) {
-        var now = clock.instant();
+        var jobStatus = resource.getStatus().getJobStatus();
+
+        if (!JobStatus.RUNNING.name().equals(jobStatus.getState())) {

Review Comment:
   I noticed while implementing that there are various invariants enforced in different layers. This per se is not a problem if they weren't leaky, i.e. we wouldn't pass the same data structures with different invariants through the layers. This becomes apparent in unit tests where implied invariants have to be modeled. One way to solve this would be to pass a different data structure, e.g. not pass JobStatus to lower levels but only the update time / running time.
   
   I think the check here doesn't hurt as it asserts an important invariant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm merged pull request #492: [FLINK-30463] Start stabilization period only after job goes into RUNNING

Posted by GitBox <gi...@apache.org>.
mxm merged PR #492:
URL: https://github.com/apache/flink-kubernetes-operator/pull/492


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #492: [FLINK-30463] Start stabilization period only after job goes into RUNNING

Posted by GitBox <gi...@apache.org>.
gyfora commented on PR #492:
URL: https://github.com/apache/flink-kubernetes-operator/pull/492#issuecomment-1361343304

   Looks good +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on pull request #492: [FLINK-30463] Start stabilization period only after job goes into RUNNING

Posted by GitBox <gi...@apache.org>.
mxm commented on PR #492:
URL: https://github.com/apache/flink-kubernetes-operator/pull/492#issuecomment-1361921179

   Thanks for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-kubernetes-operator] mxm commented on pull request #492: [FLINK-30463] Start stabilization period only after job goes into RUNNING

Posted by GitBox <gi...@apache.org>.
mxm commented on PR #492:
URL: https://github.com/apache/flink-kubernetes-operator/pull/492#issuecomment-1359586067

   Note: I need to tag the commit with the correct JIRA on merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org