You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@yunikorn.apache.org by "Ayub Pathan (Jira)" <ji...@apache.org> on 2021/01/27 03:24:00 UTC

[jira] [Created] (YUNIKORN-518) Scheduler restarts observed due to admission controller "FailedPostStartHook"

Ayub Pathan created YUNIKORN-518:
------------------------------------

             Summary: Scheduler restarts observed due to admission controller "FailedPostStartHook"
                 Key: YUNIKORN-518
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-518
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
    Affects Versions: 0.10
            Reporter: Ayub Pathan


{noformat}
Name:         yunikorn-scheduler-6577f789d8-vc5cc
Namespace:    yunikorn
Priority:     0
Node:         ip-10-192-153-109.ca-central-1.compute.internal/10.192.153.109
Start Time:   Tue, 26 Jan 2021 19:17:12 -0800
Labels:       app=yunikorn
              component=yunikorn-scheduler
              pod-template-hash=6577f789d8
              release=yunikorn
Annotations:  cni.projectcalico.org/podIP: 100.100.166.78/32
              cni.projectcalico.org/podIPs: 100.100.166.78/32
              kubernetes.io/psp: eks.privileged
Status:       Running
IP:           100.100.166.78
IPs:
  IP:           100.100.166.78
Controlled By:  ReplicaSet/yunikorn-scheduler-6577f789d8
Containers:
  yunikorn-scheduler-k8s:
    Container ID:   docker://759f2b2f14ba37f46a42cdc59a5c51ed19d442ed717b81ee98d30177b7a184e6
    Image:          container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9
    Image ID:       docker-pullable://container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler@sha256:878300a91cfd3b9d6dc515948afbfab23572a475b0df7006f06480ee06d1aceb
    Port:           9080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 26 Jan 2021 19:18:01 -0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 26 Jan 2021 19:17:33 -0800
      Finished:     Tue, 26 Jan 2021 19:17:33 -0800
    Ready:          True
    Restart Count:  3
    Limits:
      cpu:     4
      memory:  2Gi
    Requests:
      cpu:     200m
      memory:  1Gi
    Environment:
      NAMESPACE:                                yunikorn (v1:metadata.namespace)
      ADMISSION_CONTROLLER_IMAGE_REGISTRY:      container-dev.repo.cloudera.com/cloudera/yunikorn-admission
      ADMISSION_CONTROLLER_IMAGE_TAG:           0.10.0-b9
      ADMISSION_CONTROLLER_IMAGE_PULL_POLICY:   Always
      ADMISSION_CONTROLLER_IMAGE_PULL_SECRETS:  [dockercreds]
    Mounts:
      /etc/yunikorn/ from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from yunikorn-admin-token-dnq4h (ro)
  yunikorn-scheduler-web:
    Container ID:   docker://0b8205bb8292f193765bbc563ea10010106fd316257e523c3446c5685ee0d5bf
    Image:          container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9
    Image ID:       docker-pullable://container-dev.repo.cloudera.com/cloudera/yunikorn-web@sha256:a64b986df2dc737958701838f41f9fae7f2e4a353a497949ba6b9e75b4b44b66
    Port:           9889/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 26 Jan 2021 19:17:17 -0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     200m
      memory:  500Mi
    Requests:
      cpu:        100m
      memory:     100Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from yunikorn-admin-token-dnq4h (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      yunikorn-configs
    Optional:  false
  yunikorn-admin-token-dnq4h:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  yunikorn-admin-token-dnq4h
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  role.node.kubernetes.io/liftie-infra=true
Tolerations:     CriticalAddonsOnly op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
                 role.node.kubernetes.io/liftie-infra=true:NoSchedule
Events:
  Type     Reason               Age                From               Message
  ----     ------               ----               ----               -------
  Normal   Scheduled            61s                default-scheduler  Successfully assigned yunikorn/yunikorn-scheduler-6577f789d8-vc5cc to ip-10-192-153-109.ca-central-1.compute.internal
  Normal   Pulling              57s                kubelet            Pulling image "container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9"
  Normal   Started              56s                kubelet            Started container yunikorn-scheduler-web
  Normal   Created              56s                kubelet            Created container yunikorn-scheduler-web
  Normal   Pulled               56s                kubelet            Successfully pulled image "container-dev.repo.cloudera.com/cloudera/yunikorn-web:0.10.0-b9"
  Warning  FailedPreStopHook    55s (x2 over 58s)  kubelet            Exec lifecycle hook ([/bin/sh /admission_util.sh delete]) for Container "yunikorn-scheduler-k8s" in Pod "yunikorn-scheduler-6577f789d8-vc5cc_yunikorn(082e1cc7-8765-4aa3-baac-48e3b048cfc6)" failed - error: command '/bin/sh /admission_util.sh delete' exited with 126: , message: "cannot exec in a stopped state: unknown\r\n"
  Normal   Killing              55s (x2 over 58s)  kubelet            FailedPostStartHook
  Warning  BackOff              53s (x2 over 54s)  kubelet            Back-off restarting failed container
  Normal   Pulling              41s (x3 over 60s)  kubelet            Pulling image "container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9"
  Warning  FailedPostStartHook  40s (x3 over 58s)  kubelet            Exec lifecycle hook ([/bin/sh /admission_util.sh create]) for Container "yunikorn-scheduler-k8s" in Pod "yunikorn-scheduler-6577f789d8-vc5cc_yunikorn(082e1cc7-8765-4aa3-baac-48e3b048cfc6)" failed - error: command '/bin/sh /admission_util.sh create' exited with 137: , message: ""
  Normal   Started              40s (x3 over 58s)  kubelet            Started container yunikorn-scheduler-k8s
  Normal   Created              40s (x3 over 58s)  kubelet            Created container yunikorn-scheduler-k8s
  Normal   Pulled               40s (x3 over 58s)  kubelet            Successfully pulled image "container-dev.repo.cloudera.com/cloudera/yunikorn-scheduler:0.10.0-b9" {noformat}

This is not a blocker but the scheduler was restarted multiple(3) times, hence reporting. This could be due to issue in admission controller start script/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org