You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by "zhuqi-lucas (via GitHub)" <gi...@apache.org> on 2023/04/28 14:27:47 UTC

[GitHub] [yunikorn-k8shim] zhuqi-lucas opened a new pull request, #582: [YUNIKORN-1712] Placeholder allocations are being removed twice

zhuqi-lucas opened a new pull request, #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582

   ### What is this PR for?
   
   Under some circumstances, it seems that placeholder allocations are being removed multiple times:
   
   ```
   2023-04-25T06:25:46.279Z	INFO	scheduler/partition.go:1233	replacing placeholder allocation {"appID": "spark-000000031tn2lgv2gar", "allocationId": "20a4cf77-7095-4635-b9e9-43a7564385c4"}
   ...
   2023-04-25T06:25:46.299Z	INFO	scheduler/partition.go:1233	replacing placeholder allocation {"appID": "spark-000000031tn2lgv2gar", "allocationId": "20a4cf77-7095-4635-b9e9-43a7564385c4"}
   ```
   
   
   This message only appears once in the codebase, in PartitionContext.removeAllocation(). Furthermore, it is guarded by a test for release.TerminationType == si.TerminationType_PLACEHOLDER_REPLACED. This would seem to indicate that removeAllocation() is somehow being called twice. I believe this would cause the used resources on the node to be subtracted twice for the same allocation. This quickly results in health checks failing:
   
   ```
   2023-04-25T06:26:10.632Z        WARN    scheduler/health_checker.go:176 Scheduler is not healthy        {"health check values": [..., {"Name":"Consistency of data","Succeeded":false,"Description":"Check if node total resource = allocated resource + occupied resource + available resource","DiagnosisMessage":"Nodes with inconsistent data: [\"ip-10-0-112-148.eu-central-1.compute.internal\"]"}, ...]}
   ```
   
   ### What type of PR is it?
   * [ ] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   * Open an issue on Jira https://issues.apache.org/jira/browse/YUNIKORN/
   * Put link here, and add [YUNIKORN-*Jira number*] in PR title, eg. `[YUNIKORN-2] Gang scheduling interface parameters`
   
   ### How should this be tested?
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] zhuqi-lucas commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1531333468

   Thank you a lot@wilfred-s , i will check the details in the jira!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] wilfred-s commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice

Posted by "wilfred-s (via GitHub)" <gi...@apache.org>.
wilfred-s commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1530995094

   See the comments in the jira. The second release should already be filtered out by the replacement code in the application on the core side. I do not think this change will fix the underlying issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] craigcondit commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice

Posted by "craigcondit (via GitHub)" <gi...@apache.org>.
craigcondit commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1542423557

   I believe the real fix is in https://github.com/apache/yunikorn-core/pull/540.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] zhuqi-lucas commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1530830961

   @craigcondit @wilfred-s 
   
   **Conclusion:**
   The pod eventhandler updatepod and deletepod will all be called, so it will complete task twice for release allocations.
   
   Deleting a pod will not directly trigger the updatePod function in the event handler. Instead, deleting a pod will trigger the deletePod function in the scheduler event handler.
   
   The deletePod function is responsible for removing the pod from the YuniKorn scheduler's internal cache and data structures. This function also updates the status of the application associated with the pod, notifying the YuniKorn scheduler that the resources previously reserved for the pod are now available for allocation to other pods.
   
   However, if the pod is in the process of being deleted, and its status is updated to "Succeeded" or "Failed" by Kubernetes, then this update will trigger the updatePod function in the YuniKorn scheduler event handler. The updatePod function is responsible for updating the internal state of the YuniKorn scheduler with the updated pod status, and for cleaning up any resources associated with the pod.
   
   So, while deleting a pod will not directly trigger the updatePod function, updates to the pod status during the deletion process may trigger it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] zhuqi-lucas commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1536412339

   Hi @wilfred-s 
   I submit a fix in https://github.com/apache/yunikorn-k8shim/pull/582 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] zhuqi-lucas closed pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas closed pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice
URL: https://github.com/apache/yunikorn-k8shim/pull/582


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [yunikorn-k8shim] zhuqi-lucas commented on pull request #582: [YUNIKORN-1712] Placeholder allocations are being removed twice

Posted by "zhuqi-lucas (via GitHub)" <gi...@apache.org>.
zhuqi-lucas commented on PR #582:
URL: https://github.com/apache/yunikorn-k8shim/pull/582#issuecomment-1543259263

   Close this, because we will fix it in https://github.com/apache/yunikorn-k8shim/pull/582 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org