You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Chaoran Yu (Jira)" <ji...@apache.org> on 2021/03/19 23:57:00 UTC
[jira] [Commented] (YUNIKORN-584) App recovery is skipped when applicationID is not set in pods' label

    [ https://issues.apache.org/jira/browse/YUNIKORN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305246#comment-17305246 ] 

Chaoran Yu commented on YUNIKORN-584:
-------------------------------------

Thanks to Weiwei's help, we were able to locate the root cause. Previously I mistakenly thought that the issue happens after YK runs for a few days. But it actually could happen every time YK restarts. When YK restarts, it scans pods available in the cluster to identify applications to recover. During this process, an application could be skipped because YK [only looks at pods with the "applicationId" label|https://github.com/apache/incubator-yunikorn-k8shim/blob/f0f9cb7bd4a2419c1ebb1156a4085e1d3f51da15/pkg/appmgmt/general/general.go#L298]. But many pods don't have it, causing YK to miss those apps during recovery. Then later, when YK sees an allocation for an app that's not in the partition context, YK will remove the node that the allocation is on.

Another potential cause for this issue is when during recovery, YK sees an app that belongs to a queue that's not configured (e.g. when placement rules are disabled), YK will also not add the app as a recovery candidate, causing the allocations for that app, and later on the nodes that have those allocations to be missing. But this can be mitigated on the YK user's side by making sure the queues are properly configured.

> App recovery is skipped when applicationID is not set in pods' label
> --------------------------------------------------------------------
>
>                 Key: YUNIKORN-584
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-584
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>            Reporter: Chaoran Yu
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.10
>
>
> There are cases when YK may think that the cluster doesn't have enough resources even though that's not actually the case. This has happened twice to me after running YK in a cluster for a few days and then one day, the [nodes endpoint|https://yunikorn.apache.org/docs/next/api/scheduler#nodes] shows that the cluster only has one node (i.e. the node that YK itself is running on), despite that the K8s cluster has 10 nodes in total. And if I try to schedule a workload that requires more resources than available on that node, YK will make pods pending with an event like below:
> {quote}Normal  PodUnschedulable  41s   yunikorn  Task <namespace>/<pod> is pending for the requested resources become available{quote}
> because it's not aware that other nodes in the cluster has available resources.
> All of this can be fixed by just restarting YK (scaling down the replica to 0 and then back up to 1). So it seems that an issue with cache is causing the issue, although it's not yet clear to me the exact conditions that triggered this bug.
> My environment is on AWS EKS with K8s 1.17, if that matters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org