You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by "zhuqi-lucas (via GitHub)" <gi...@apache.org> on 2023/04/28 14:27:47 UTC
[GitHub] [yunikorn-k8shim] zhuqi-lucas opened a new pull request, #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
zhuqi-lucas opened a new pull request, #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582
### What is this PR for?
Under some circumstances, it seems that placeholder allocations are being removed multiple times:
```
2023-04-25T06:25:46.279Z INFO scheduler/partition.go:1233 replacing placeholder allocation {"appID": "spark-000000031tn2lgv2gar", "allocationId": "20a4cf77-7095-4635-b9e9-43a7564385c4"}
...
2023-04-25T06:25:46.299Z INFO scheduler/partition.go:1233 replacing placeholder allocation {"appID": "spark-000000031tn2lgv2gar", "allocationId": "20a4cf77-7095-4635-b9e9-43a7564385c4"}
```
This message only appears once in the codebase, in PartitionContext.removeAllocation(). Furthermore, it is guarded by a test for release.TerminationType == si.TerminationType_PLACEHOLDER_REPLACED. This would seem to indicate that removeAllocation() is somehow being called twice. I believe this would cause the used resources on the node to be subtracted twice for the same allocation. This quickly results in health checks failing:
```
2023-04-25T06:26:10.632Z WARN scheduler/health_checker.go:176 Scheduler is not healthy {"health check values": [..., {"Name":"Consistency of data","Succeeded":false,"Description":"Check if node total resource = allocated resource + occupied resource + available resource","DiagnosisMessage":"Nodes with inconsistent data: [\"ip-10-0-112-148.eu-central-1.compute.internal\"]"}, ...]}
```
### What type of PR is it?
* [ ] - Bug Fix
* [ ] - Improvement
* [ ] - Feature
* [ ] - Documentation
* [ ] - Hot Fix
* [ ] - Refactoring
### Todos
* [ ] - Task
### What is the Jira issue?
* Open an issue on Jira https://issues.apache.org/jira/browse/YUNIKORN/
* Put link here, and add [YUNIKORN-*Jira number*] in PR title, eg. `[YUNIKORN-2] Gang scheduling interface parameters`
### How should this be tested?
### Screenshots (if appropriate)
### Questions:
* [ ] - The licenses files need update.
* [ ] - There is breaking changes for older versions.
* [ ] - It needs documentation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-k8shim] zhuqi-lucas commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1531333468
Thank you a lot@wilfred-s , i will check the details in the jira!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-k8shim] wilfred-s commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
Posted by "wilfred-s (via GitHub)" <gi...@apache.org>.
wilfred-s commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1530995094
See the comments in the jira. The second release should already be filtered out by the replacement code in the application on the core side. I do not think this change will fix the underlying issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-k8shim] craigcondit commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1542423557
I believe the real fix is in https://github.com/apache/yunikorn-core/pull/540.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-k8shim] zhuqi-lucas commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1530830961
@craigcondit @wilfred-s
**Conclusion:**
The pod eventhandler updatepod and deletepod will all be called, so it will complete task twice for release allocations.
Deleting a pod will not directly trigger the updatePod function in the event handler. Instead, deleting a pod will trigger the deletePod function in the scheduler event handler.
The deletePod function is responsible for removing the pod from the YuniKorn scheduler's internal cache and data structures. This function also updates the status of the application associated with the pod, notifying the YuniKorn scheduler that the resources previously reserved for the pod are now available for allocation to other pods.
However, if the pod is in the process of being deleted, and its status is updated to "Succeeded" or "Failed" by Kubernetes, then this update will trigger the updatePod function in the YuniKorn scheduler event handler. The updatePod function is responsible for updating the internal state of the YuniKorn scheduler with the updated pod status, and for cleaning up any resources associated with the pod.
So, while deleting a pod will not directly trigger the updatePod function, updates to the pod status during the deletion process may trigger it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-k8shim] zhuqi-lucas commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1536412339
Hi @wilfred-s
I submit a fix in https://github.com/apache/yunikorn-k8shim/pull/582
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-k8shim] zhuqi-lucas closed pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas closed pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
URL: https://github.com/apache/yunikorn-k8shim/pull/582
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [yunikorn-k8shim] zhuqi-lucas commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1543259263
Close this, because we will fix it in https://github.com/apache/yunikorn-k8shim/pull/582
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org