You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by "saffronjam (via GitHub)" <gi...@apache.org> on 2023/08/07 15:10:53 UTC

[GitHub] [cloudstack-kubernetes-provider] saffronjam opened a new issue, #52: Unable to auto-scale Kubernetes cluster

saffronjam opened a new issue, #52:
URL: https://github.com/apache/cloudstack-kubernetes-provider/issues/52

   Hi!
   
   I am unable to auto-scale Kubernetes clusters. As I understand, it create a "cluster-autoscaler" deployment that decides whether to scale or not. However, it does not seem to work, since it logs multiple errors and warnings in the pod, even though it is a completely clean cluster.
   
   Normal scaling seems to work just fine.
   
   # Setup
   A "default" CloudStack setup running KVMs.
   
   ## Settings (relevant)
   - Cloud kubernetes service enabled **true**
   - Cloud kubernetes cluster experimental features enabled **true**
   - Cloud kubernetes cluster max size **50**
   
   The nodes uses the following service offering:
   - 2 CPU x 2.05 Ghz
   - 2048 MB memory
   - 8 GB root disk
   
   # Replicate
   1.  Create a new cluster using Kubernets 1.24 ISO found here:
   http://download.cloudstack.org/cks/
   
   2. Enable forced auto-scaling
   Since the cluster starts with only one worker node, auto-scaling with 3-5 nodes should trigger an upscale (I assume) 
   ![Screenshot from 2023-08-07 16-55-00](https://github.com/apache/cloudstack-kubernetes-provider/assets/26722370/9b92cc88-b107-42cb-8268-4ae3af25c1f6)
   
   4. Check the logs for cluster-autoscaler in the Kubernetes cluster
   Some notable entries:
   ```
   E0807 14:41:30.317148       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
   
   E0807 14:41:32.388828       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
   ```
   Even though I have not edited anything myself, it is a clean CKS cluster
   
   ```
   W0807 14:41:43.251280       1 clusterstate.go:590] Failed to get nodegroup for 6a4c91a3-9694-4596-9ddd-dc86e60136ff: Unable to find node 6a4c91a3-9694-4596-9ddd-dc86e60136ff in cluster
   
   W0807 14:41:43.251361       1 clusterstate.go:590] Failed to get nodegroup for bd0b855f-6dc6-4678-9bea-b52329333024: Unable to find node bd0b855f-6dc6-4678-9bea-b52329333024 in cluster
   
   I0807 14:57:06.667061       1 static_autoscaler.go:341] 2 unregistered nodes present
   ```
   
   The entire log:
   [logs-from-cluster-autoscaler-in-cluster-autoscaler-5bf887ddd8-hxg2g.log](https://github.com/apache/cloudstack-kubernetes-provider/files/12281530/logs-from-cluster-autoscaler-in-cluster-autoscaler-5bf887ddd8-hxg2g.log)
   
   Please tell me if you need more logs to look at, or if I should try some other configuration.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@cloudstack.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [cloudstack-kubernetes-provider] rohityadavcloud commented on issue #52: Unable to auto-scale Kubernetes cluster

Posted by "rohityadavcloud (via GitHub)" <gi...@apache.org>.
rohityadavcloud commented on issue #52:
URL: https://github.com/apache/cloudstack-kubernetes-provider/issues/52#issuecomment-1668091677

   cc @Pearl1594 @weizhouapache @DaanHoogland  pl help triage when you've time
   
   @saffronjam this looks like an issue with k8s autoscaler (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/cloudstack/README.md) or with CKS (upstream https://github.com/apache/cloudstack)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Unable to auto-scale Kubernetes cluster [cloudstack-kubernetes-provider]

Posted by "kiranchavala (via GitHub)" <gi...@apache.org>.
kiranchavala commented on issue #52:
URL: https://github.com/apache/cloudstack-kubernetes-provider/issues/52#issuecomment-1968230660

   Hi @saffronjam 
   
   The autoscaling feature works fine on a k8s cluster deployed by CKS.
   
   Please find the steps that i have followed
   
   After you enable autoscaling on the cluster
   
   ![Screenshot 2024-02-28 at 10 10 20 AM](https://github.com/apache/cloudstack-kubernetes-provider/assets/1401014/d356d4a2-1015-492a-baa3-51ea496b6348)
   
   Make sure the autoscaling pod is deployed in the cluster
   
   
   ```
   kubectl get pods -A
   NAMESPACE              NAME                                             READY   STATUS    RESTARTS      AGE
   kube-system            cluster-autoscaler-8d8894d6c-q8r4h               1/1     Running   0             19m
   ```
   
   Before scaling
   
   ```
   ➜  ~ k get nodes -A
   NAME                     STATUS   ROLES           AGE   VERSION
   gh-control-18debd77e18   Ready    control-plane   10h   v1.28.4
   gh-node-18debd8440c      Ready    <none>          10h   v1.28.4
   ```
   
   
   
   
   Deploy a application 
   
   `kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.39 -- /agnhost netexec --http-port=80`
   ```
   
   ➜  ~ k get pods -A
   NAMESPACE              NAME                                             READY   STATUS    RESTARTS      AGE
   default                hello-node-7c6c5fb9d8-bgd69                      1/1     Running   0             10h
   
   ```
   
   
   Scale the application 
   
   `kubectl scale --replicas=150 deployment/hello-node`
   
   
   
   logs from the autoscaler pod 
   
   ```
   
   I0228 04:51:46.798087       1 reflector.go:536] /home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356: Watch close - *v1.StatefulSet total 9 items received
   I0228 04:51:51.244004       1 static_autoscaler.go:235] Starting main loop
   I0228 04:51:51.244382       1 client.go:169] NewAPIRequest API request URL:http://10.0.34.2:8080/client/api?apiKey=***&command=listKubernetesClusters&id=14b42c5d-e7e6-4c41-b638-5facb98b0a93&response=json&signature=***
   I0228 04:51:51.279721       1 client.go:175] NewAPIRequest response status code:200
   I0228 04:51:51.280798       1 cloudstack_manager.go:88] Got cluster : &{14b42c5d-e7e6-4c41-b638-5facb98b0a93 gh 2 3 1 1 [0xc0013bfad0 0xc0013bfb00] map[gh-control-18debd77e18:0xc0013bfad0 gh-node-18debd8440c:0xc0013bfb00]}
   W0228 04:51:51.292009       1 clusterstate.go:590] Failed to get nodegroup for dc95f481-15a3-4629-bb78-055fbe4a7139: Unable to find node dc95f481-15a3-4629-bb78-055fbe4a7139 in cluster
   W0228 04:51:51.292052       1 clusterstate.go:590] Failed to get nodegroup for facdd040-53fe-4984-8654-c186a7cdde9b: Unable to find node facdd040-53fe-4984-8654-c186a7cdde9b in cluster
   I0228 04:51:51.292095       1 static_autoscaler.go:341] 2 unregistered nodes present
   I0228 04:51:51.292105       1 static_autoscaler.go:624] Removing unregistered node dc95f481-15a3-4629-bb78-055fbe4a7139
   W0228 04:51:51.292126       1 static_autoscaler.go:627] Failed to get node group for dc95f481-15a3-4629-bb78-055fbe4a7139: Unable to find node dc95f481-15a3-4629-bb78-055fbe4a7139 in cluster
   W0228 04:51:51.292137       1 static_autoscaler.go:346] Failed to remove unregistered nodes: Unable to find node dc95f481-15a3-4629-bb78-055fbe4a7139 in cluster
   I0228 04:51:51.292569       1 filter_out_schedulable.go:65] Filtering out schedulables
   I0228 04:51:51.292590       1 filter_out_schedulable.go:137] Filtered out 0 pods using hints
   I0228 04:51:51.624523       1 filter_out_schedulable.go:175] 44 pods were kept as unschedulable based on caching
   I0228 04:51:51.624568       1 filter_out_schedulable.go:176] 0 pods marked as unschedulable can be scheduled.
   I0228 04:51:51.624667       1 filter_out_schedulable.go:87] No schedulable pods
   I0228 04:51:51.870314       1 static_autoscaler.go:480] Calculating unneeded nodes
   I0228 04:51:51.870353       1 pre_filtering_processor.go:66] Skipping gh-control-18debd77e18 - node group min size reached
   I0228 04:51:51.870361       1 pre_filtering_processor.go:66] Skipping gh-node-18debd8440c - node group min size reached
   I0228 04:51:51.870413       1 static_autoscaler.go:534] Scale down status: unneededOnly=false lastScaleUpTime=2024-02-27 17:39:33.032517071 +0000 UTC m=-3594.093061754 lastScaleDownDeleteTime=2024-02-27 17:39:33.032517071 +0000 UTC m=-3594.093061754 lastScaleDownFailTime=2024-02-27 17:39:33.032517071 +0000 UTC m=-3594.093061754 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
   
   
   I0228 04:52:22.258545       1 scale_up.go:468] Best option to resize: 14b42c5d-e7e6-4c41-b638-5facb98b0a93
   I0228 04:52:22.258602       1 scale_up.go:472] Estimated 1 nodes needed in 14b42c5d-e7e6-4c41-b638-5facb98b0a93
   I0228 04:52:22.266675       1 scale_up.go:595] Final scale-up plan: [{14b42c5d-e7e6-4c41-b638-5facb98b0a93 1->2 (max: 3)}]
   I0228 04:52:22.266915       1 scale_up.go:691] Scale-up: setting group 14b42c5d-e7e6-4c41-b638-5facb98b0a93 size to 2
   I0228 04:52:22.267040       1 cloudstack_node_group.go:57] Increase Cluster : 14b42c5d-e7e6-4c41-b638-5facb98b0a93 by 1
   I0228 04:52:22.267238       1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"3e317689-2939-4a66-b764-b2bb938c433c", APIVersion:"v1", ResourceVersion:"75712", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group 14b42c5d-e7e6-4c41-b638-5facb98b0a93 size to 2 instead of 1 (max: 3)
   I0228 04:52:22.267350       1 client.go:169] NewAPIRequest API request URL:http://10.0.34.2:8080/client/api?apiKey=***&command=scaleKubernetesCluster&id=14b42c5d-e7e6-4c41-b638-5facb98b0a93&response=json&size=2&signature=***
   I0228 04:52:22.297307       1 client.go:175] NewAPIRequest response status code:200
   I0228 04:52:28.385682       1 reflector.go:536] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Node total 10 items received
   I0228 04:52:32.324971       1 client.go:169] NewAPIRequest API request URL:http://10.0.34.2:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=4e62a5a3-825c-435e-a6df-c22e756ee5e4&response=json&signature=***
   I0228 04:52:32.346120       1 client.go:175] NewAPIRequest response status code:200
   I0228 04:52:32.360171       1 client.go:110] Still waiting for job 4e62a5a3-825c-435e-a6df-c22e756ee5e4 to complete
   I0228 04:52:33.993372       1 reflector.go:536] /home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:188: Watch close - *v1.Pod total 306 items received
   I0228 04:52:42.328416       1 client.go:169] NewAPIRequest API request URL:http://10.0.34.2:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=4e62a5a3-825c-435e-a6df-c22e756ee5e4&response=json&signature=***
   I0228 04:52:42.357795       1 client.go:175] NewAPIRequest response status code:200
   I0228 04:52:52.356394       1 client.go:110] Still waiting for job 4e62a5a3-825c-435e-a6df-c22e756ee5e4 to complete
   
   ```
   
   
   ```
   
   ➜  ~ k get nodes -A
   NAME                     STATUS   ROLES           AGE   VERSION
   gh-control-18debd77e18   Ready    control-plane   10h   v1.28.4
   gh-node-18debd8440c      Ready    <none>          10h   v1.28.4
   gh-node-18dee0e78ba      Ready    <none>          42s   v1.28.4
   
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@cloudstack.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org