You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@yunikorn.apache.org by "Wilfred Spiegelenburg (Jira)" <ji...@apache.org> on 2021/01/29 14:28:00 UTC
[jira] [Resolved] (YUNIKORN-518) Placeholder manager failed to init
during scheduler recovery
[ https://issues.apache.org/jira/browse/YUNIKORN-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wilfred Spiegelenburg resolved YUNIKORN-518.
--------------------------------------------
Fix Version/s: 0.10
Resolution: Fixed
Changes committed: placeholder manager is now always initialised
> Placeholder manager failed to init during scheduler recovery
> ------------------------------------------------------------
>
> Key: YUNIKORN-518
> URL: https://issues.apache.org/jira/browse/YUNIKORN-518
> Project: Apache YuniKorn
> Issue Type: Sub-task
> Components: shim - kubernetes
> Affects Versions: 0.10
> Reporter: Ayub Pathan
> Assignee: Weiwei Yang
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.10
>
> Attachments: yk-sc.log
>
>
> {noformat}
> Name: yunikorn-scheduler-6577f789d8-vc5cc
> Namespace: yunikorn
> Priority: 0
> Node: ip-10-192-153-109.ca-central-1.compute.internal/10.192.153.109
> Start Time: Tue, 26 Jan 2021 19:17:12 -0800
> Labels: app=yunikorn
> component=yunikorn-scheduler
> pod-template-hash=6577f789d8
> release=yunikorn
> Annotations: cni.projectcalico.org/podIP: 100.100.166.78/32
> cni.projectcalico.org/podIPs: 100.100.166.78/32
> kubernetes.io/psp: eks.privileged
> Status: Running
> IP: 100.100.166.78
> IPs:
> IP: 100.100.166.78
> Controlled By: ReplicaSet/yunikorn-scheduler-6577f789d8
> Containers:
> yunikorn-scheduler-k8s:
> Container ID: docker://759f2b2f14ba37f46a42cdc59a5c51ed19d442ed717b81ee98d30177b7a184e6
> Image: container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9
> Image ID: docker-pullable://container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler@sha256:878300a91cfd3b9d6dc515948afbfab23572a475b0df7006f06480ee06d1aceb
> Port: 9080/TCP
> Host Port: 0/TCP
> State: Running
> Started: Tue, 26 Jan 2021 19:18:01 -0800
> Last State: Terminated
> Reason: Error
> Exit Code: 1
> Started: Tue, 26 Jan 2021 19:17:33 -0800
> Finished: Tue, 26 Jan 2021 19:17:33 -0800
> Ready: True
> Restart Count: 3
> Limits:
> cpu: 4
> memory: 2Gi
> Requests:
> cpu: 200m
> memory: 1Gi
> Environment:
> NAMESPACE: yunikorn (v1:metadata.namespace)
> ADMISSION_CONTROLLER_IMAGE_REGISTRY: container-dev.repo.cloudera.com/cloudera/yunikorn-admission
> ADMISSION_CONTROLLER_IMAGE_TAG: 0.10.0-b9
> ADMISSION_CONTROLLER_IMAGE_PULL_POLICY: Always
> ADMISSION_CONTROLLER_IMAGE_PULL_SECRETS: [dockercreds]
> Mounts:
> /etc/yunikorn/ from config-volume (rw)
> /var/run/secrets/kubernetes.io/serviceaccount from yunikorn-admin-token-dnq4h (ro)
> yunikorn-scheduler-web:
> Container ID: docker://0b8205bb8292f193765bbc563ea10010106fd316257e523c3446c5685ee0d5bf
> Image: container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9
> Image ID: docker-pullable://container-dev.repo.cloudera.com/cloudera/yunikorn-web@sha256:a64b986df2dc737958701838f41f9fae7f2e4a353a497949ba6b9e75b4b44b66
> Port: 9889/TCP
> Host Port: 0/TCP
> State: Running
> Started: Tue, 26 Jan 2021 19:17:17 -0800
> Ready: True
> Restart Count: 0
> Limits:
> cpu: 200m
> memory: 500Mi
> Requests:
> cpu: 100m
> memory: 100Mi
> Environment: <none>
> Mounts:
> /var/run/secrets/kubernetes.io/serviceaccount from yunikorn-admin-token-dnq4h (ro)
> Conditions:
> Type Status
> Initialized True
> Ready True
> ContainersReady True
> PodScheduled True
> Volumes:
> config-volume:
> Type: ConfigMap (a volume populated by a ConfigMap)
> Name: yunikorn-configs
> Optional: false
> yunikorn-admin-token-dnq4h:
> Type: Secret (a volume populated by a Secret)
> SecretName: yunikorn-admin-token-dnq4h
> Optional: false
> QoS Class: Burstable
> Node-Selectors: role.node.kubernetes.io/liftie-infra=true
> Tolerations: CriticalAddonsOnly op=Exists
> node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
> node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
> role.node.kubernetes.io/liftie-infra=true:NoSchedule
> Events:
> Type Reason Age From Message
> ---- ------ ---- ---- -------
> Normal Scheduled 61s default-scheduler Successfully assigned yunikorn/yunikorn-scheduler-6577f789d8-vc5cc to ip-10-192-153-109.ca-central-1.compute.internal
> Normal Pulling 57s kubelet Pulling image "container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9"
> Normal Started 56s kubelet Started container yunikorn-scheduler-web
> Normal Created 56s kubelet Created container yunikorn-scheduler-web
> Normal Pulled 56s kubelet Successfully pulled image "container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9"
> Warning FailedPreStopHook 55s (x2 over 58s) kubelet Exec lifecycle hook ([/bin/sh /admission_util.sh delete]) for Container "yunikorn-scheduler-k8s" in Pod "yunikorn-scheduler-6577f789d8-vc5cc_yunikorn(082e1cc7-8765-4aa3-baac-48e3b048cfc6)" failed - error: command '/bin/sh /admission_util.sh delete' exited with 126: , message: "cannot exec in a stopped state: unknown\r\n"
> Normal Killing 55s (x2 over 58s) kubelet FailedPostStartHook
> Warning BackOff 53s (x2 over 54s) kubelet Back-off restarting failed container
> Normal Pulling 41s (x3 over 60s) kubelet Pulling image "container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9"
> Warning FailedPostStartHook 40s (x3 over 58s) kubelet Exec lifecycle hook ([/bin/sh /admission_util.sh create]) for Container "yunikorn-scheduler-k8s" in Pod "yunikorn-scheduler-6577f789d8-vc5cc_yunikorn(082e1cc7-8765-4aa3-baac-48e3b048cfc6)" failed - error: command '/bin/sh /admission_util.sh create' exited with 137: , message: ""
> Normal Started 40s (x3 over 58s) kubelet Started container yunikorn-scheduler-k8s
> Normal Created 40s (x3 over 58s) kubelet Created container yunikorn-scheduler-k8s
> Normal Pulled 40s (x3 over 58s) kubelet Successfully pulled image "container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9" {noformat}
> This is not a blocker but the scheduler was restarted multiple(3) times, hence reporting. This could be due to issue in admission controller start script/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org