You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Manikandan R (Jira)" <ji...@apache.org> on 2023/08/17 11:16:00 UTC
[jira] [Resolved] (YUNIKORN-1919) runningApps is not correct when app state from starting to completing
[ https://issues.apache.org/jira/browse/YUNIKORN-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manikandan R resolved YUNIKORN-1919.
------------------------------------
Resolution: Fixed
> runningApps is not correct when app state from starting to completing
> ---------------------------------------------------------------------
>
> Key: YUNIKORN-1919
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1919
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: PoAn Yang
> Assignee: PoAn Yang
> Priority: Major
> Labels: pull-request-available
>
> We increase runningApps when app gets into starting state[1]. We decrease runningApps when app leaves running state[2]. However, in some cases, app doesn't get into running state, so the runningApps result will get error. Finally, we can't allocate another app[3].
>
> Reproduce steps:
> 1. Set queue config.
> {noformat}
> data:
> queues.yaml: |
> partitions:
> - name: default
> nodesortpolicy:
> type: fair
> queues:
> - name: root
> parent: true
> queues:
> - name: default # default queue for applications that don't specify a queue
> submitacl: '*'
> - name: sandbox1
> submitacl: '*'
> maxapplications: 1{noformat}
> 2. Apply a deployment.
> {noformat}
> apiVersion: apps/v1
> kind: Deployment
> metadata:
> name: sleep-deployment
> labels:
> app: sleep-deployment
> applicationId: "sleep-deployment"
> queue: "root.sandbox1"
> spec:
> replicas: 1
> selector:
> matchLabels:
> app: sleep-deployment
> applicationId: "sleep-deployment"
> queue: "root.sandbox1"
> template:
> metadata:
> labels:
> app: sleep-deployment
> applicationId: "sleep-deployment"
> queue: "root.sandbox1"
> spec:
> containers:
> - name: sleep-30s
> image: alpine:latest
> command: ["sleep", "30"]{noformat}
> 3. Apply a job.
> {noformat}
> apiVersion: batch/v1
> kind: Job
> metadata:
> name: sleep-job
> spec:
> parallelism: 1
> template:
> metadata:
> labels:
> app: sleep-job
> applicationId: "sleep-job"
> queue: "root.sandbox1"
> spec:
> containers:
> - name: sleep-job
> image: alpine:latest
> command: ["sleep", "30"]
> restartPolicy: Never{noformat}
> 4. Delete the deployment.
> 5. The pod of job can't get started.
>
> [1] [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/application_state.go#L152]
> [2] [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/application_state.go#L188]
> [3] [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/queue.go#L1300-L1302]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org