You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2020/08/03 14:18:13 UTC
[GitHub] [incubator-yunikorn-core] adamantal edited a comment on pull request #190: [YUNIKORN-332] Add events for reserved pods
adamantal edited a comment on pull request #190:
URL: https://github.com/apache/incubator-yunikorn-core/pull/190#issuecomment-668048294
Plenty of questions here, let me try to summarize first where is the latest status on this.
The current implementation works as follows:
- If the pod's resource request can be accomodated, it will be started without any event created neither for the pod nor the node. This currently also means the "PodBindSuccessful" event emitted by the shim itself - I did not touch this part.
```
$ kubectl describe pod pod-with-enough-resources
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduling 26m yunikorn default/task0 is queued and waiting for allocation
Normal Scheduled 26m yunikorn Successfully assigned default/task0 to node docker-desktop
Normal PodBindSuccessful 26m yunikorn Pod default/task0 is successfully bound to node docker-desktop
Normal Started 23m (x4 over 25m) kubelet, docker-desktop Started container sleep-30s
Normal Pulling 22m (x5 over 25m) kubelet, docker-desktop Pulling image "alpine:latest"
Normal Pulled 22m (x5 over 25m) kubelet, docker-desktop Successfully pulled image "alpine:latest"
Normal Created 22m (x5 over 25m) kubelet, docker-desktop Created container sleep-30s
```
- If the pod's resource usage is too high to be spawned in some node, it will be "reserved" by the scheduler:
From yunikorn-scheduler log:
```
2020-08-03T15:50:13.614+0200 INFO scheduler/scheduling_partition.go:529 allocation ask is reserved {"appID": "application-sleep-0003", "queue": "root.default", "allocationKey": "a9446376-981c-4499-8ce3-855eb8571ba9", "node": "docker-desktop"}
```
From k8s commands:
```
AAntal-MBP15:yunikorn-k8shim adamantal$ kubectl get events --field-selector=involvedObject.kind=Node
LAST SEEN TYPE REASON OBJECT MESSAGE
6m41s Normal NodeHasSufficientMemory node/docker-desktop Node docker-desktop status is now: NodeHasSufficientMemory
6m41s Normal NodeHasNoDiskPressure node/docker-desktop Node docker-desktop status is now: NodeHasNoDiskPressure
6m41s Normal NodeHasSufficientPID node/docker-desktop Node docker-desktop status is now: NodeHasSufficientPID
6m21s Normal RegisteredNode node/docker-desktop Node docker-desktop event: Registered Node docker-desktop in Controller
6m20s Normal Starting node/docker-desktop Starting kube-proxy.
2m33s Normal NodeAccepted node/docker-desktop node docker-desktop is accepted by the scheduler
107s Normal AllocationAskReservedOnNode node/docker-desktop Ask a9446376-981c-4499-8ce3-855eb8571ba9 from application application-sleep-0003 is reserved on this node
```
but I also have to add that there's no trace of the `AllocationAskReservedOnNode` in the output of the `kubectl describe node docker-desktop` command:
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 41m kubelet, docker-desktop Starting kubelet.
Normal NodeHasSufficientMemory 41m (x8 over 41m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 41m (x8 over 41m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 41m (x7 over 41m) kubelet, docker-desktop Node docker-desktop status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 41m kubelet, docker-desktop Updated Node Allocatable limit across pods
Normal Starting 41m kube-proxy, docker-desktop Starting kube-proxy.
```
This might be a bug.
So my questions are:
1. What is exactly the "reservation" event you described [in the description of the jira](https://issues.apache.org/jira/browse/YUNIKORN-332)? Keep in mind that we already have a `PodBindSuccessful` event emitted by shim - put a bookmark with a `TODO` to the code to show where I believe such events should be emitted from the core.
2. Should such reservation event be exposed for pods of any kind (those that can be started and those that can not)? If yes, I suggest to rename these events because reservation in the YuniKorn scheduler seems to be a very different concept.
3. If a pod gets deleted, the app will not be unreserved due to lack of application lifecycle. Therefore the implementation of emitting `AllocationAskUnreservedFromNode` event is not working. The other problem is that this reservation is seemingly bound to the application and not the pod itself. I feel some discrepancy here: the pod could not reserved to a node anyways if it does not have the required resources.
I'm eager to hear all your thoughts on this @yangwwei @wilfred-s @kingamarton how to move forward.
P.s.: found a data race which is seemingly unrelated to this patch, filed [YUNIKORN-342](https://issues.apache.org/jira/browse/YUNIKORN-342).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org