You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by "saffronjam (via GitHub)" <gi...@apache.org> on 2023/08/07 15:10:53 UTC
[GitHub] [cloudstack-kubernetes-provider] saffronjam opened a new issue, #52: Unable to auto-scale Kubernetes cluster
saffronjam opened a new issue, #52:
URL: https://github.com/apache/cloudstack-kubernetes-provider/issues/52
Hi!
I am unable to auto-scale Kubernetes clusters. As I understand, it create a "cluster-autoscaler" deployment that decides whether to scale or not. However, it does not seem to work, since it logs multiple errors and warnings in the pod, even though it is a completely clean cluster.
Normal scaling seems to work just fine.
# Setup
A "default" CloudStack setup running KVMs.
## Settings (relevant)
- Cloud kubernetes service enabled **true**
- Cloud kubernetes cluster experimental features enabled **true**
- Cloud kubernetes cluster max size **50**
The nodes uses the following service offering:
- 2 CPU x 2.05 Ghz
- 2048 MB memory
- 8 GB root disk
# Replicate
1. Create a new cluster using Kubernets 1.24 ISO found here:
http://download.cloudstack.org/cks/
2. Enable forced auto-scaling
Since the cluster starts with only one worker node, auto-scaling with 3-5 nodes should trigger an upscale (I assume)
![Screenshot from 2023-08-07 16-55-00](https://github.com/apache/cloudstack-kubernetes-provider/assets/26722370/9b92cc88-b107-42cb-8268-4ae3af25c1f6)
4. Check the logs for cluster-autoscaler in the Kubernetes cluster
Some notable entries:
```
E0807 14:41:30.317148 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope
E0807 14:41:32.388828 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
```
Even though I have not edited anything myself, it is a clean CKS cluster
```
W0807 14:41:43.251280 1 clusterstate.go:590] Failed to get nodegroup for 6a4c91a3-9694-4596-9ddd-dc86e60136ff: Unable to find node 6a4c91a3-9694-4596-9ddd-dc86e60136ff in cluster
W0807 14:41:43.251361 1 clusterstate.go:590] Failed to get nodegroup for bd0b855f-6dc6-4678-9bea-b52329333024: Unable to find node bd0b855f-6dc6-4678-9bea-b52329333024 in cluster
I0807 14:57:06.667061 1 static_autoscaler.go:341] 2 unregistered nodes present
```
The entire log:
[logs-from-cluster-autoscaler-in-cluster-autoscaler-5bf887ddd8-hxg2g.log](https://github.com/apache/cloudstack-kubernetes-provider/files/12281530/logs-from-cluster-autoscaler-in-cluster-autoscaler-5bf887ddd8-hxg2g.log)
Please tell me if you need more logs to look at, or if I should try some other configuration.
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@cloudstack.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [cloudstack-kubernetes-provider] rohityadavcloud commented on issue #52: Unable to auto-scale Kubernetes cluster
Posted by "rohityadavcloud (via GitHub)" <gi...@apache.org>.
rohityadavcloud commented on issue #52:
URL: https://github.com/apache/cloudstack-kubernetes-provider/issues/52#issuecomment-1668091677
cc @Pearl1594 @weizhouapache @DaanHoogland pl help triage when you've time
@saffronjam this looks like an issue with k8s autoscaler (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/cloudstack/README.md) or with CKS (upstream https://github.com/apache/cloudstack)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@cloudstack.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Unable to auto-scale Kubernetes cluster [cloudstack-kubernetes-provider]
Posted by "kiranchavala (via GitHub)" <gi...@apache.org>.
kiranchavala commented on issue #52:
URL: https://github.com/apache/cloudstack-kubernetes-provider/issues/52#issuecomment-1968230660
Hi @saffronjam
The autoscaling feature works fine on a k8s cluster deployed by CKS.
Please find the steps that i have followed
After you enable autoscaling on the cluster
![Screenshot 2024-02-28 at 10 10 20 AM](https://github.com/apache/cloudstack-kubernetes-provider/assets/1401014/d356d4a2-1015-492a-baa3-51ea496b6348)
Make sure the autoscaling pod is deployed in the cluster
```
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cluster-autoscaler-8d8894d6c-q8r4h 1/1 Running 0 19m
```
Before scaling
```
➜ ~ k get nodes -A
NAME STATUS ROLES AGE VERSION
gh-control-18debd77e18 Ready control-plane 10h v1.28.4
gh-node-18debd8440c Ready <none> 10h v1.28.4
```
Deploy a application
`kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.39 -- /agnhost netexec --http-port=80`
```
➜ ~ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-node-7c6c5fb9d8-bgd69 1/1 Running 0 10h
```
Scale the application
`kubectl scale --replicas=150 deployment/hello-node`
logs from the autoscaler pod
```
I0228 04:51:46.798087 1 reflector.go:536] /home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:356: Watch close - *v1.StatefulSet total 9 items received
I0228 04:51:51.244004 1 static_autoscaler.go:235] Starting main loop
I0228 04:51:51.244382 1 client.go:169] NewAPIRequest API request URL:http://10.0.34.2:8080/client/api?apiKey=***&command=listKubernetesClusters&id=14b42c5d-e7e6-4c41-b638-5facb98b0a93&response=json&signature=***
I0228 04:51:51.279721 1 client.go:175] NewAPIRequest response status code:200
I0228 04:51:51.280798 1 cloudstack_manager.go:88] Got cluster : &{14b42c5d-e7e6-4c41-b638-5facb98b0a93 gh 2 3 1 1 [0xc0013bfad0 0xc0013bfb00] map[gh-control-18debd77e18:0xc0013bfad0 gh-node-18debd8440c:0xc0013bfb00]}
W0228 04:51:51.292009 1 clusterstate.go:590] Failed to get nodegroup for dc95f481-15a3-4629-bb78-055fbe4a7139: Unable to find node dc95f481-15a3-4629-bb78-055fbe4a7139 in cluster
W0228 04:51:51.292052 1 clusterstate.go:590] Failed to get nodegroup for facdd040-53fe-4984-8654-c186a7cdde9b: Unable to find node facdd040-53fe-4984-8654-c186a7cdde9b in cluster
I0228 04:51:51.292095 1 static_autoscaler.go:341] 2 unregistered nodes present
I0228 04:51:51.292105 1 static_autoscaler.go:624] Removing unregistered node dc95f481-15a3-4629-bb78-055fbe4a7139
W0228 04:51:51.292126 1 static_autoscaler.go:627] Failed to get node group for dc95f481-15a3-4629-bb78-055fbe4a7139: Unable to find node dc95f481-15a3-4629-bb78-055fbe4a7139 in cluster
W0228 04:51:51.292137 1 static_autoscaler.go:346] Failed to remove unregistered nodes: Unable to find node dc95f481-15a3-4629-bb78-055fbe4a7139 in cluster
I0228 04:51:51.292569 1 filter_out_schedulable.go:65] Filtering out schedulables
I0228 04:51:51.292590 1 filter_out_schedulable.go:137] Filtered out 0 pods using hints
I0228 04:51:51.624523 1 filter_out_schedulable.go:175] 44 pods were kept as unschedulable based on caching
I0228 04:51:51.624568 1 filter_out_schedulable.go:176] 0 pods marked as unschedulable can be scheduled.
I0228 04:51:51.624667 1 filter_out_schedulable.go:87] No schedulable pods
I0228 04:51:51.870314 1 static_autoscaler.go:480] Calculating unneeded nodes
I0228 04:51:51.870353 1 pre_filtering_processor.go:66] Skipping gh-control-18debd77e18 - node group min size reached
I0228 04:51:51.870361 1 pre_filtering_processor.go:66] Skipping gh-node-18debd8440c - node group min size reached
I0228 04:51:51.870413 1 static_autoscaler.go:534] Scale down status: unneededOnly=false lastScaleUpTime=2024-02-27 17:39:33.032517071 +0000 UTC m=-3594.093061754 lastScaleDownDeleteTime=2024-02-27 17:39:33.032517071 +0000 UTC m=-3594.093061754 lastScaleDownFailTime=2024-02-27 17:39:33.032517071 +0000 UTC m=-3594.093061754 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=false
I0228 04:52:22.258545 1 scale_up.go:468] Best option to resize: 14b42c5d-e7e6-4c41-b638-5facb98b0a93
I0228 04:52:22.258602 1 scale_up.go:472] Estimated 1 nodes needed in 14b42c5d-e7e6-4c41-b638-5facb98b0a93
I0228 04:52:22.266675 1 scale_up.go:595] Final scale-up plan: [{14b42c5d-e7e6-4c41-b638-5facb98b0a93 1->2 (max: 3)}]
I0228 04:52:22.266915 1 scale_up.go:691] Scale-up: setting group 14b42c5d-e7e6-4c41-b638-5facb98b0a93 size to 2
I0228 04:52:22.267040 1 cloudstack_node_group.go:57] Increase Cluster : 14b42c5d-e7e6-4c41-b638-5facb98b0a93 by 1
I0228 04:52:22.267238 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"3e317689-2939-4a66-b764-b2bb938c433c", APIVersion:"v1", ResourceVersion:"75712", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group 14b42c5d-e7e6-4c41-b638-5facb98b0a93 size to 2 instead of 1 (max: 3)
I0228 04:52:22.267350 1 client.go:169] NewAPIRequest API request URL:http://10.0.34.2:8080/client/api?apiKey=***&command=scaleKubernetesCluster&id=14b42c5d-e7e6-4c41-b638-5facb98b0a93&response=json&size=2&signature=***
I0228 04:52:22.297307 1 client.go:175] NewAPIRequest response status code:200
I0228 04:52:28.385682 1 reflector.go:536] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Node total 10 items received
I0228 04:52:32.324971 1 client.go:169] NewAPIRequest API request URL:http://10.0.34.2:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=4e62a5a3-825c-435e-a6df-c22e756ee5e4&response=json&signature=***
I0228 04:52:32.346120 1 client.go:175] NewAPIRequest response status code:200
I0228 04:52:32.360171 1 client.go:110] Still waiting for job 4e62a5a3-825c-435e-a6df-c22e756ee5e4 to complete
I0228 04:52:33.993372 1 reflector.go:536] /home/djumani/lab/autoscaler/cluster-autoscaler/utils/kubernetes/listers.go:188: Watch close - *v1.Pod total 306 items received
I0228 04:52:42.328416 1 client.go:169] NewAPIRequest API request URL:http://10.0.34.2:8080/client/api?apiKey=***&command=queryAsyncJobResult&jobid=4e62a5a3-825c-435e-a6df-c22e756ee5e4&response=json&signature=***
I0228 04:52:42.357795 1 client.go:175] NewAPIRequest response status code:200
I0228 04:52:52.356394 1 client.go:110] Still waiting for job 4e62a5a3-825c-435e-a6df-c22e756ee5e4 to complete
```
```
➜ ~ k get nodes -A
NAME STATUS ROLES AGE VERSION
gh-control-18debd77e18 Ready control-plane 10h v1.28.4
gh-node-18debd8440c Ready <none> 10h v1.28.4
gh-node-18dee0e78ba Ready <none> 42s v1.28.4
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscribe@cloudstack.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org