You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@yunikorn.apache.org by "Ayub Pathan (Jira)" <ji...@apache.org> on 2021/01/27 03:24:00 UTC
[jira] [Created] (YUNIKORN-518) Scheduler restarts observed due to
admission controller "FailedPostStartHook"
Ayub Pathan created YUNIKORN-518:
------------------------------------
Summary: Scheduler restarts observed due to admission controller "FailedPostStartHook"
Key: YUNIKORN-518
URL: https://issues.apache.org/jira/browse/YUNIKORN-518
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Affects Versions: 0.10
Reporter: Ayub Pathan
{noformat}
Name: yunikorn-scheduler-6577f789d8-vc5cc
Namespace: yunikorn
Priority: 0
Node: ip-10-192-153-109.ca-central-1.compute.internal/10.192.153.109
Start Time: Tue, 26 Jan 2021 19:17:12 -0800
Labels: app=yunikorn
component=yunikorn-scheduler
pod-template-hash=6577f789d8
release=yunikorn
Annotations: cni.projectcalico.org/podIP: 100.100.166.78/32
cni.projectcalico.org/podIPs: 100.100.166.78/32
kubernetes.io/psp: eks.privileged
Status: Running
IP: 100.100.166.78
IPs:
IP: 100.100.166.78
Controlled By: ReplicaSet/yunikorn-scheduler-6577f789d8
Containers:
yunikorn-scheduler-k8s:
Container ID: docker://759f2b2f14ba37f46a42cdc59a5c51ed19d442ed717b81ee98d30177b7a184e6
Image: container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9
Image ID: docker-pullable://container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler@sha256:878300a91cfd3b9d6dc515948afbfab23572a475b0df7006f06480ee06d1aceb
Port: 9080/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 26 Jan 2021 19:18:01 -0800
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 26 Jan 2021 19:17:33 -0800
Finished: Tue, 26 Jan 2021 19:17:33 -0800
Ready: True
Restart Count: 3
Limits:
cpu: 4
memory: 2Gi
Requests:
cpu: 200m
memory: 1Gi
Environment:
NAMESPACE: yunikorn (v1:metadata.namespace)
ADMISSION_CONTROLLER_IMAGE_REGISTRY: container-dev.repo.cloudera.com/cloudera/yunikorn-admission
ADMISSION_CONTROLLER_IMAGE_TAG: 0.10.0-b9
ADMISSION_CONTROLLER_IMAGE_PULL_POLICY: Always
ADMISSION_CONTROLLER_IMAGE_PULL_SECRETS: [dockercreds]
Mounts:
/etc/yunikorn/ from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from yunikorn-admin-token-dnq4h (ro)
yunikorn-scheduler-web:
Container ID: docker://0b8205bb8292f193765bbc563ea10010106fd316257e523c3446c5685ee0d5bf
Image: container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9
Image ID: docker-pullable://container-dev.repo.cloudera.com/cloudera/yunikorn-web@sha256:a64b986df2dc737958701838f41f9fae7f2e4a353a497949ba6b9e75b4b44b66
Port: 9889/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 26 Jan 2021 19:17:17 -0800
Ready: True
Restart Count: 0
Limits:
cpu: 200m
memory: 500Mi
Requests:
cpu: 100m
memory: 100Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from yunikorn-admin-token-dnq4h (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: yunikorn-configs
Optional: false
yunikorn-admin-token-dnq4h:
Type: Secret (a volume populated by a Secret)
SecretName: yunikorn-admin-token-dnq4h
Optional: false
QoS Class: Burstable
Node-Selectors: role.node.kubernetes.io/liftie-infra=true
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
role.node.kubernetes.io/liftie-infra=true:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 61s default-scheduler Successfully assigned yunikorn/yunikorn-scheduler-6577f789d8-vc5cc to ip-10-192-153-109.ca-central-1.compute.internal
Normal Pulling 57s kubelet Pulling image "container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9"
Normal Started 56s kubelet Started container yunikorn-scheduler-web
Normal Created 56s kubelet Created container yunikorn-scheduler-web
Normal Pulled 56s kubelet Successfully pulled image "container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9"
Warning FailedPreStopHook 55s (x2 over 58s) kubelet Exec lifecycle hook ([/bin/sh /admission_util.sh delete]) for Container "yunikorn-scheduler-k8s" in Pod "yunikorn-scheduler-6577f789d8-vc5cc_yunikorn(082e1cc7-8765-4aa3-baac-48e3b048cfc6)" failed - error: command '/bin/sh /admission_util.sh delete' exited with 126: , message: "cannot exec in a stopped state: unknown\r\n"
Normal Killing 55s (x2 over 58s) kubelet FailedPostStartHook
Warning BackOff 53s (x2 over 54s) kubelet Back-off restarting failed container
Normal Pulling 41s (x3 over 60s) kubelet Pulling image "container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9"
Warning FailedPostStartHook 40s (x3 over 58s) kubelet Exec lifecycle hook ([/bin/sh /admission_util.sh create]) for Container "yunikorn-scheduler-k8s" in Pod "yunikorn-scheduler-6577f789d8-vc5cc_yunikorn(082e1cc7-8765-4aa3-baac-48e3b048cfc6)" failed - error: command '/bin/sh /admission_util.sh create' exited with 137: , message: ""
Normal Started 40s (x3 over 58s) kubelet Started container yunikorn-scheduler-k8s
Normal Created 40s (x3 over 58s) kubelet Created container yunikorn-scheduler-k8s
Normal Pulled 40s (x3 over 58s) kubelet Successfully pulled image "container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9" {noformat}
This is not a blocker but the scheduler was restarted multiple(3) times, hence reporting. This could be due to issue in admission controller start script/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org