You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Manikandan R (Jira)" <ji...@apache.org> on 2023/08/17 11:16:00 UTC

[jira] [Resolved] (YUNIKORN-1919) runningApps is not correct when app state from starting to completing

     [ https://issues.apache.org/jira/browse/YUNIKORN-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manikandan R resolved YUNIKORN-1919.
------------------------------------
    Resolution: Fixed

> runningApps is not correct when app state from starting to completing
> ---------------------------------------------------------------------
>
>                 Key: YUNIKORN-1919
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1919
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: PoAn Yang
>            Assignee: PoAn Yang
>            Priority: Major
>              Labels: pull-request-available
>
> We increase runningApps when app gets into starting state[1]. We decrease runningApps when app leaves running state[2]. However, in some cases, app doesn't get into running state, so the runningApps result will get error. Finally, we can't allocate another app[3].
>  
> Reproduce steps:
> 1. Set queue config.
> {noformat}
> data:
>   queues.yaml: |
>     partitions:
>     - name: default
>       nodesortpolicy:
>         type: fair
>       queues:
>       - name: root
>         parent: true
>         queues:
>         - name: default # default queue for applications that don't specify a queue
>           submitacl: '*'
>         - name: sandbox1
>           submitacl: '*'
>           maxapplications: 1{noformat}
> 2. Apply a deployment.
> {noformat}
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   name: sleep-deployment
>   labels:
>     app: sleep-deployment
>     applicationId: "sleep-deployment"
>     queue: "root.sandbox1"
> spec:
>   replicas: 1
>   selector:
>     matchLabels:
>       app: sleep-deployment
>       applicationId: "sleep-deployment"
>       queue: "root.sandbox1"
>   template:
>     metadata:
>       labels:
>         app: sleep-deployment
>         applicationId: "sleep-deployment"
>         queue: "root.sandbox1"
>     spec:
>       containers:
>       - name: sleep-30s
>         image: alpine:latest
>         command: ["sleep", "30"]{noformat}
> 3. Apply a job.
> {noformat}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: sleep-job
> spec:
>   parallelism: 1
>   template:
>     metadata:
>       labels:
>         app: sleep-job
>         applicationId: "sleep-job"
>         queue: "root.sandbox1"
>     spec:
>       containers:
>       - name: sleep-job
>         image: alpine:latest
>         command: ["sleep",  "30"]
>       restartPolicy: Never{noformat}
> 4. Delete the deployment.
> 5. The pod of job can't get started.
>  
> [1] [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/application_state.go#L152]
> [2] [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/application_state.go#L188]
> [3] [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/queue.go#L1300-L1302]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org