You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@yunikorn.apache.org by "Chaoran Yu (Jira)" <ji...@apache.org> on 2022/02/21 00:20:00 UTC
[jira] [Created] (YUNIKORN-1085) DaemonSet pods may fail to be scheduled on new nodes added during autoscaling
Chaoran Yu created YUNIKORN-1085:
------------------------------------
Summary: DaemonSet pods may fail to be scheduled on new nodes added during autoscaling
Key: YUNIKORN-1085
URL: https://issues.apache.org/jira/browse/YUNIKORN-1085
Project: Apache YuniKorn
Issue Type: Bug
Components: shim - kubernetes
Affects Versions: 0.12.2
Environment: Amazon EKS, K8s 1.20, Cluster Autoscaler
Reporter: Chaoran Yu
After YUNIKORN-704 was done, YuniKorn should have the same mechanism as the default scheduler when it comes to scheduling DaemonSet pods. That's the case most times in our deployments. But recently we have found that DaemonSet scheduling became problematic again: When K8s Cluster Autoscaler adds new nodes in response to pending pods in the cluster, EKS will automatically create a CNI DaemonSet (Amazon's container networking module), one pod on each newly created node. But YuniKorn could not schedule these pods successfully. There's no informative error messages. The default queue that these pods belong to have available resources too. Because they couldn't be scheduled, EKS refuses to mark the new nodes as ready, they then get stuck in NotReady state. This issue is not always reproducible, but it has happened a few times. The root cause needs to be further researched
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org