You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/05/19 06:08:18 UTC

[GitHub] [flink-kubernetes-operator] gyfora opened a new pull request, #227: [FLINK-27685] Scale subresource prototype

gyfora opened a new pull request, #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227

   This is a proof of concept of introducing Kubernetes scaling capabilities and support for using the Horizontal Pod Autoscaler for application deployments.
   
   **Background:** 
   https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource
   https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
   
   **Scaling for Flink applications:**
   Our initial idea was to use the parallelism as the scaling unit/parameter. This is however not a really good match for the Kubernetes scaling behaviour, where you have a concept of replicas and watch pods/containers to decide whether to scale up or down. By using parallelism you will end up with confusing terminology and half used task managers (where not all the slots are taken)
   
   We decided it might be better to use the number of taskmanager pods as the replica count for scaling purposes as it more closely matches what is actually happening under the hood
   
   **In this PR:**
   We introduce `taskManager.replicas` in the spec, which is by default `null`. When defined it takes precedence over the job parallelism setting and the parallelism is computed using `replicas * numTaskSlots`.
   
   We also expose the taskmanager replicas together with the pod label selector in the status (this is necessary for the scale subresource). 
   
   After setting the correct annotations the generate subresource looks like this:
   
   ```
   subresources:
         scale:
           labelSelectorPath: .status.taskManager.labelSelector
           specReplicasPath: .spec.taskManager.replicas
           statusReplicasPath: .status.taskManager.replicas
   ```
   
   With this we can now crate a job and hpa:
   <img width="1035" alt="image" src="https://user-images.githubusercontent.com/5880972/169222154-eac7c296-1b05-4a59-b897-36688d080fe8.png">
   
   The HorizontalPodAutoscaler policy will continuously monitor the TM pods and scale the number up/down.
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] rr-sushant-choudhary commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

rr-sushant-choudhary commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1149076793

   @gyfora Deleting Flink deployment object itself is stuck, it's not deleting the resource. Flink Operator Pod is running and doesn't show any error logs too on both webhook and operator container. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1149065932

   @rr-sushant-choudhary you need a running operator in order to delete the deployment resources. What you could do is start the operator from the main branch (keep the current CRD) delete all deployments , then replace CRD 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] jeesmon commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

jeesmon commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1132092222

   Yes, I can build locally and try. We just started exploring this operator and one of the hurdles to go with operator was lack of autoscaling support in native integration. This will definitely help with that concern. I will keep you updated. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1132961228

   @jeesmon That is completely expected. When you scale up the operator performs a stateful upgrade to the new parallelism.
   The job is resubmitted and the native integration takes care of creating the new taskmanagers. If you don't have enough resources your job won't be able to start.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1149070867

   @wangyang0918 @mbalassi @Aitozi @morhidi  do you think we should merge this as a first iteration or discuss further on the mailing list? I feel like the general sentiment is that this approach seems to be reasonable :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] morhidi commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

morhidi commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1152066959

   I agree with @gyfora here. Let's add this feature without overthinking it too much and see what we can bring out of it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] Aitozi commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

Aitozi commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1150014958

   @gyfora Thanks for driving this topic, I will take a closer look on it tomorrow 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] jeesmon commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

jeesmon commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1132958028

   @gyfora I was able to run the `basic-scaling` example with your changes locally and it seems to be working fine. One thing I noticed is parallelism getting updated even when new pods are in Pending state. For example, I manually scaled up `FlinkDeployment` using `kubectl scale` and I didn't have enough resources to run all replicas of TMs in my minkube cluster. But operator seemed to have adjusted parallelism based on taskManager.replicas without checking the status of TM pods. Not sure it's an expected behavior but just wanted to point it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] Aitozi commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

Aitozi commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1152232542

   >  The general assumption for HPA is homogeneous resources.
   
   I totally agree with your point, and we should let user have the same expect from it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] jeesmon commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

jeesmon commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1133323923

   @gyfora Considering a scale event can potentially block resuming a job (due to new task managers in Pending state because of worker node provisioning delay or other scheduling issues), do you agree a standalone mode + reactive is a safer approach for auto scaling? Just trying to compare pros and cons. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora merged pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora merged PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] rr-sushant-choudhary commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

rr-sushant-choudhary commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1149061792

   @gyfora I tried to build these changes locally and was able to test the scaling feature. However now I am trying to go back to main branch CRD but this branch flinkdeployment CRD is not getting deleted, its stuck in 'Terminating' State, can you please help to get out of this situation? I would like to cleanup everything associated with it
   <img width="686" alt="Screenshot 2022-06-08 at 12 37 01 AM" src="https://user-images.githubusercontent.com/62716350/172462876-a2e8056f-255a-4837-bf43-e240ee62e344.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on a diff in pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on code in PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#discussion_r894195379


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/FlinkUtils.java:
##########
@@ -278,4 +280,10 @@ public static boolean isKubernetesHAActivated(Configuration configuration) {
     public static boolean clusterShutdownDisabled(FlinkDeploymentSpec spec) {
         return spec.getFlinkVersion().isNewerVersionThan(FlinkVersion.v1_14);
     }
+
+    public static int getNumTaskManagers(Configuration conf) {
+        int parallelism = conf.get(CoreOptions.DEFAULT_PARALLELISM);
+        int taskSlots = conf.get(TaskManagerOptions.NUM_TASK_SLOTS);
+        return (parallelism + taskSlots - 1) / taskSlots;

Review Comment:
   In java integer division always rounds down. 5/2 would result in 2 instead of rounding up to 3 what we need to compute here. `(parallelism + taskSlots - 1) / taskSlots` is a nice trick to divide parallelism/taskslots while rounding up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1132071756

   @jeesmon as you can see in the example I provided this works with the `FlinkDeployment` resource which the operator manages using the Flink Native integration.
   
   You don't need to set anything else, only use the taskManager.replicas config instead of setting the parallelism for your `FlinkDeployment` resource. This will work in a similar way to how the reactive mode works with the only difference is that scaling is done through the kubernetes api or a HPA.
   
   In this version the operator does not create the HPA policy, it is definitely out of scope for this initial version and even later we have to discuss how far we want to go :) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1133530577

   But in general you are right the reactive mode should be quicker and more reliable in shaky conditions. We can probably support that in the future once standalone mode support is added 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1132085456

   I think we have to wait until after the release to merge this and give some time for others to chime in on the design. 
   
   On the other hand as you can see this is a minimal (solid) change that won't interfere with anything else. It would be great if you could build and try this in your environment and provide some feedback on how well it covers your usecase. That would definitely speed up our design discussions :) You could cherry-pick this on top of the release branch, and have a 1.0.0+ version :D 
   
   You are right this won't work for SessionJobs. We could scale those based on the parallelism only but we would have a hard time even selecting the taskmanager pods to monitor. So I would say thats out of scope.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1149080490

   @rr-sushant-choudhary try restarting the operator , I believe this is not related to this particular change. Feel free to ping me in email/slack if you get stuck. There are also ways of force deleting things if you google it.
   
   let’s try to keep this off from this PR discussion if you don’t mind :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] Aitozi commented on a diff in pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

Aitozi commented on code in PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#discussion_r894100904


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/FlinkUtils.java:
##########
@@ -278,4 +280,10 @@ public static boolean isKubernetesHAActivated(Configuration configuration) {
     public static boolean clusterShutdownDisabled(FlinkDeploymentSpec spec) {
         return spec.getFlinkVersion().isNewerVersionThan(FlinkVersion.v1_14);
     }
+
+    public static int getNumTaskManagers(Configuration conf) {
+        int parallelism = conf.get(CoreOptions.DEFAULT_PARALLELISM);
+        int taskSlots = conf.get(TaskManagerOptions.NUM_TASK_SLOTS);
+        return (parallelism + taskSlots - 1) / taskSlots;

Review Comment:
   what's the meaning of `(parallelism + taskSlots - 1) / taskSlots` ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1152018399

   @Aitozi thank you for raising these concerns. I think the main question is, why would anyone use autoscaling/HPA (and expect it to work) if they hardcode parallelism or have very fine grained resource requirements. I think if they have these problems the autoscaling logic is really not for them and I think it's very hard to solve it nicely. The general assumption for HPA is homogeneous resources.
   
   The hardcoding / overwriting of configuration from within your program is a concern in general when using the operator. Not just for scaling but any manual spec change too we should advise users to only use the spec to configure these settings.
   
   Other than this, since the current behaviour is not impacted by this change, people can decide whether to use HPA or not. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1149066565

   Also you cannot delete the CRD if you have flinkdeployment object those need to be deleted first 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] Aitozi commented on a diff in pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

Aitozi commented on code in PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#discussion_r894101067


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/FlinkUtils.java:
##########
@@ -278,4 +280,10 @@ public static boolean isKubernetesHAActivated(Configuration configuration) {
     public static boolean clusterShutdownDisabled(FlinkDeploymentSpec spec) {
         return spec.getFlinkVersion().isNewerVersionThan(FlinkVersion.v1_14);
     }
+
+    public static int getNumTaskManagers(Configuration conf) {
+        int parallelism = conf.get(CoreOptions.DEFAULT_PARALLELISM);
+        int taskSlots = conf.get(TaskManagerOptions.NUM_TASK_SLOTS);
+        return (parallelism + taskSlots - 1) / taskSlots;

Review Comment:
   IMO, it should be `parallelism / taskSlots`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] jeesmon commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

jeesmon commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1132062727

   @gyfora Thanks for the prototype. We are very much interested in this feature for kube native deployment. Does this work like reactive-mode? Do we need any other special configuration for this to work? Like `scheduler-mode=reactive`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

gyfora commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1133530034

   @jeesmon in some sense I agree.
   
   But task manager pods can fail for various reasons and you need to make sure you can provision them reliably if you are running real time applications.
   
   So if you have capacity problems in the cluster , maybe autoscaler is not a good idea in general.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [flink-kubernetes-operator] jeesmon commented on pull request #227: [FLINK-27685] Scale subresource prototype

Posted by GitBox <gi...@apache.org>.

jeesmon commented on PR #227:
URL: https://github.com/apache/flink-kubernetes-operator/pull/227#issuecomment-1132078223

   This is perfect, hope it will get approved and merged soon. As with reactive-mode, this will work only for `FlinkDeployment` and not for `FlinkSessionJob`, right? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org