You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2020/08/03 14:18:13 UTC

[GitHub] [incubator-yunikorn-core] adamantal edited a comment on pull request #190: [YUNIKORN-332] Add events for reserved pods

adamantal edited a comment on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-core/pull/190#issuecomment-668048294


   Plenty of questions here, let me try to summarize first where is the latest status on this.
   
   The current implementation works as follows:
   - If the pod's resource request can be accomodated, it will be started without any event created neither for the pod nor the node. This currently also means the "PodBindSuccessful" event emitted by the shim itself - I did not touch this part.
   ```
   $ kubectl describe pod pod-with-enough-resources
   ...
   Events:
     Type     Reason             Age                 From                     Message
     ----     ------             ----                ----                     -------
     Normal   Scheduling         26m                 yunikorn                 default/task0 is queued and waiting for allocation
     Normal   Scheduled          26m                 yunikorn                 Successfully assigned default/task0 to node docker-desktop
     Normal   PodBindSuccessful  26m                 yunikorn                 Pod default/task0 is successfully bound to node docker-desktop
     Normal   Started            23m (x4 over 25m)   kubelet, docker-desktop  Started container sleep-30s
     Normal   Pulling            22m (x5 over 25m)   kubelet, docker-desktop  Pulling image "alpine:latest"
     Normal   Pulled             22m (x5 over 25m)   kubelet, docker-desktop  Successfully pulled image "alpine:latest"
     Normal   Created            22m (x5 over 25m)   kubelet, docker-desktop  Created container sleep-30s
   ```
   - If the pod's resource usage is too high to be spawned in some node, it will be "reserved" by the scheduler:
   From yunikorn-scheduler log:
   ```
   2020-08-03T15:50:13.614+0200	INFO	scheduler/scheduling_partition.go:529	allocation ask is reserved	{"appID": "application-sleep-0003", "queue": "root.default", "allocationKey": "a9446376-981c-4499-8ce3-855eb8571ba9", "node": "docker-desktop"}
   ```
   From k8s commands:
   ```
   AAntal-MBP15:yunikorn-k8shim adamantal$ kubectl get events --field-selector=involvedObject.kind=Node
   LAST SEEN   TYPE     REASON                        OBJECT                MESSAGE
   6m41s       Normal   NodeHasSufficientMemory       node/docker-desktop   Node docker-desktop status is now: NodeHasSufficientMemory
   6m41s       Normal   NodeHasNoDiskPressure         node/docker-desktop   Node docker-desktop status is now: NodeHasNoDiskPressure
   6m41s       Normal   NodeHasSufficientPID          node/docker-desktop   Node docker-desktop status is now: NodeHasSufficientPID
   6m21s       Normal   RegisteredNode                node/docker-desktop   Node docker-desktop event: Registered Node docker-desktop in Controller
   6m20s       Normal   Starting                      node/docker-desktop   Starting kube-proxy.
   2m33s       Normal   NodeAccepted                  node/docker-desktop   node docker-desktop is accepted by the scheduler
   107s        Normal   AllocationAskReservedOnNode   node/docker-desktop   Ask a9446376-981c-4499-8ce3-855eb8571ba9 from application application-sleep-0003 is reserved on this node
   ```
   but I also have to add that there's no trace of the `AllocationAskReservedOnNode` in the output of the `kubectl describe node docker-desktop` command:
   ```
   Events:
     Type    Reason                   Age                From                        Message
     ----    ------                   ----               ----                        -------
     Normal  Starting                 41m                kubelet, docker-desktop     Starting kubelet.
     Normal  NodeHasSufficientMemory  41m (x8 over 41m)  kubelet, docker-desktop     Node docker-desktop status is now: NodeHasSufficientMemory
     Normal  NodeHasNoDiskPressure    41m (x8 over 41m)  kubelet, docker-desktop     Node docker-desktop status is now: NodeHasNoDiskPressure
     Normal  NodeHasSufficientPID     41m (x7 over 41m)  kubelet, docker-desktop     Node docker-desktop status is now: NodeHasSufficientPID
     Normal  NodeAllocatableEnforced  41m                kubelet, docker-desktop     Updated Node Allocatable limit across pods
     Normal  Starting                 41m                kube-proxy, docker-desktop  Starting kube-proxy.
   ```
   This might be a bug.
   
   So my questions are:
   1. What is exactly the "reservation" event you described [in the description of the jira](https://issues.apache.org/jira/browse/YUNIKORN-332)? Keep in mind that we already have a `PodBindSuccessful` event emitted by shim - put a bookmark with a `TODO` to the code to show where I believe such events should be emitted from the core.
   2. Should such reservation event be exposed for pods of any kind (those that can be started and those that can not)? If yes, I suggest to rename these events because reservation in the YuniKorn scheduler seems to be a very different concept.
   3. If a pod gets deleted, the app will not be unreserved due to lack of application lifecycle. Therefore the implementation of emitting `AllocationAskUnreservedFromNode` event is not working. The other problem is that this reservation is seemingly bound to the application and not the pod itself. I feel some discrepancy here: the pod could not reserved to a node anyways if it does not have the required resources.
   
   I'm eager to hear all your thoughts on this @yangwwei @wilfred-s @kingamarton how to move forward.
   
   P.s.: found a data race which is seemingly unrelated to this patch, filed [YUNIKORN-342](https://issues.apache.org/jira/browse/YUNIKORN-342).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org