You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@yunikorn.apache.org by "Wei Huang (Jira)" <ji...@apache.org> on 2023/04/21 23:52:00 UTC
[jira] [Created] (YUNIKORN-1706) weird symptom when scheduling pod without specifying 'queue' label

Wei Huang created YUNIKORN-1706:
-----------------------------------

             Summary: weird symptom when scheduling pod without specifying 'queue' label
                 Key: YUNIKORN-1706
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1706
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: shim - kubernetes
            Reporter: Wei Huang


I'm running a local dev env *make run_plugin* based on 1.2.0, no admission controller is configured. Additionally, I configured a configmap in the default namespace:


{code:bash}
apiVersion: v1
data:
  queues.yaml: |
    partitions:
    - name: default
      nodesortpolicy:
        type: binpacking
      queues:
      - name: root
        submitacl: '*'
        queues:
        - name: app1
          submitacl: '*'
          properties:
            application.sort.policy: fifo
          resources:
            max:
              {memory: 200G, vcore: 1}
kind: ConfigMap
metadata:
  name: yunikorn-configs
{code}

Then I create a Pod with the following config:

{code:bash}
kind: Pod
apiVersion: v1
metadata:
  name: pod-1
  labels:
    applicationId: "app1"
spec:
  schedulerName: yunikorn
  containers:
  - name: pause
    image: registry.k8s.io/pause:3.6
    resources:
      requests:
       cpu: 1
{code}


The pod cannot be scheduled with a status *ApplicationRejected*, and I observed log in the shim as:

{code:bash}
2023-04-21T16:34:42.354-0700	INFO	cache/context.go:741	app added	{"appID": "app1"}
2023-04-21T16:34:42.354-0700	INFO	cache/context.go:831	task added	{"appID": "app1", "taskID": "d643a5ad-c93b-4d99-8eac-9418fbac18b0", "taskState": "New"}
2023-04-21T16:34:42.355-0700	INFO	cache/context.go:841	app request originating pod added	{"appID": "app1", "original task": "d643a5ad-c93b-4d99-8eac-9418fbac18b0"}
I0421 16:34:42.355111   46423 factory.go:344] "Unable to schedule pod; no fit; waiting" pod="default/pod-1" err="0/1 nodes are available: 1 Pod is not ready for scheduling."
2023-04-21T16:34:42.689-0700	INFO	cache/application.go:413	handle app submission	{"app": "applicationID: app1, queue: root.sandbox, partition: default, totalNumOfTasks: 1, currentState: Submitted", "clusterID": "mycluster"}
2023-04-21T16:34:42.692-0700	INFO	objects/application_state.go:132	Application state transition	{"appID": "app1", "source": "New", "destination": "Rejected", "event": "rejectApplication"}
2023-04-21T16:34:42.692-0700	ERROR	scheduler/context.go:540	Failed to add application to partition (placement rejected)	{"applicationID": "app1", "partitionName": "[mycluster]default", "error": "application 'app1' rejected, cannot create queue 'root.sandbox' without placement rules"}
github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateApplicationEvent
	/Users/weih/go/src/github.pie.apple.com/apache/yunikorn-k8shim/vendor/github.com/apache/yunikorn-core/pkg/scheduler/context.go:540
github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent
	/Users/weih/go/src/github.pie.apple.com/apache/yunikorn-k8shim/vendor/github.com/apache/yunikorn-core/pkg/scheduler/scheduler.go:113
2023-04-21T16:34:42.693-0700	INFO	cache/application.go:565	app is rejected by scheduler	{"appID": "app1"}
2023-04-21T16:34:42.693-0700	INFO	cache/application.go:598	failApplication reason	{"applicationID": "app1", "errMsg": "ApplicationRejected: application 'app1' rejected, cannot create queue 'root.sandbox' without placement rules"}
2023-04-21T16:34:42.694-0700	INFO	cache/application.go:585	setting pod to failed	{"podName": "pod-1"}
2023-04-21T16:34:42.712-0700	INFO	general/general.go:179	task completes	{"appType": "general", "namespace": "default", "podName": "pod-1", "podUID": "d643a5ad-c93b-4d99-8eac-9418fbac18b0", "podStatus": "Failed"}
2023-04-21T16:34:42.714-0700	INFO	client/kubeclient.go:246	Successfully updated pod status	{"namespace": "default", "podName": "pod-1", "newStatus": "&PodStatus{Phase:Failed,Conditions:[]PodCondition{},Message: application 'app1' rejected, cannot create queue 'root.sandbox' without placement rules,Reason:ApplicationRejected,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{},QOSClass:,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},}"}
2023-04-21T16:34:42.714-0700	INFO	cache/application.go:590	new pod status	{"status": "Failed"}
2023-04-21T16:34:42.714-0700	INFO	cache/task.go:543	releasing allocations	{"numOfAsksToRelease": 1, "numOfAllocationsToRelease": 0}
2023-04-21T16:34:42.714-0700	INFO	cache/placeholder_manager.go:115	start to clean up app placeholders	{"appID": "app1"}
2023-04-21T16:34:42.714-0700	INFO	cache/placeholder_manager.go:128	finished cleaning up app placeholders	{"appID": "app1"}
2023-04-21T16:34:42.714-0700	INFO	scheduler/partition.go:1343	Invalid ask release requested by shim	{"appID": "app1", "ask": "d643a5ad-c93b-4d99-8eac-9418fbac18b0", "terminationType": "UNKNOWN_TERMINATION_TYPE"}
2023-04-21T16:34:42.714-0700	INFO	cache/task_state.go:372	object transition	{"object": {}, "source": "New", "destination": "Completed", "event": "CompleteTask"} 
{code}

Then I deleted the pod, and noticed the log shows:

{code:bash}
2023-04-21T16:35:09.598-0700	INFO	general/general.go:213	delete pod	{"appType": "general", "namespace": "default", "podName": "pod-1", "podUID": "d643a5ad-c93b-4d99-8eac-9418fbac18b0"}
2023-04-21T16:35:09.598-0700	WARN	cache/task.go:528	task allocation UUID is empty, sending this release request to yunikorn-core could cause all allocations of this app get released. skip this request, this may cause some resource leak. check the logs for more info!	{"applicationID": "app1", "taskID": "d643a5ad-c93b-4d99-8eac-9418fbac18b0", "taskAlias": "default/pod-1", "allocationUUID": "", "task": "Completed"}
{code}

Then if I recreated the same pod by just appending the queue label:

{code:bash}
queue: root.app1
{code}

The pod is still unschedulable and remains the status forever. And the only solution to make it schedulable is to restart shim.

Is it a bug?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org