You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/05/20 03:58:43 UTC

[GitHub] [incubator-yunikorn-k8shim] yangwwei opened a new pull request #265: [YUNIKORN-677] Potential resource leak when complete and allocate pod happens simultaneously

yangwwei opened a new pull request #265:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/265


   ### What is this PR for?
   When pod deletion and allocation happen simultaneously, there is a race that can possibly lead to resource leaks. This causes the pods could not be released properly even the job has finished. This can be observed from the web UI if a job already finished (no pods running anymore on K8s) but on YK UI it is still showing as Running with a few allocations, which means we are hitting this issue.
   
   ### What type of PR is it?
   * [x] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   * https://issues.apache.org/jira/browse/YUNIKORN-677
   
   ### How should this be tested?
   Tested locally
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] wilfred-s closed pull request #265: [YUNIKORN-677] Potential resource leak when complete and allocate pod happens simultaneously

Posted by GitBox <gi...@apache.org>.
wilfred-s closed pull request #265:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/265


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] commented on pull request #265: [YUNIKORN-677] Potential resource leak when complete and allocate pod happens simultaneously

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #265:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/265#issuecomment-844698418


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#265](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0dbafac) into [master](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/commit/c47ed51f075c5af5910f71da40e7e68699a9abae?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c47ed51) will **increase** coverage by `1.62%`.
   > The diff coverage is `68.13%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/graphs/tree.svg?width=650&height=150&src=pr&token=LZImIuvleR&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #265      +/-   ##
   ==========================================
   + Coverage   59.75%   61.37%   +1.62%     
   ==========================================
     Files          35       37       +2     
     Lines        3133     3355     +222     
   ==========================================
   + Hits         1872     2059     +187     
   - Misses       1180     1213      +33     
   - Partials       81       83       +2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [pkg/appmgmt/appmgmt\_recovery.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2FwcG1nbXQvYXBwbWdtdF9yZWNvdmVyeS5nbw==) | `67.50% <0.00%> (-8.18%)` | :arrow_down: |
   | [pkg/cache/amprotocol\_mock.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2FtcHJvdG9jb2xfbW9jay5nbw==) | `0.00% <0.00%> (ø)` | |
   | [pkg/cache/context\_recovery.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2NvbnRleHRfcmVjb3ZlcnkuZ28=) | `45.78% <0.00%> (-0.56%)` | :arrow_down: |
   | [pkg/cache/nodes.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL25vZGVzLmdv) | `79.80% <ø> (ø)` | |
   | [pkg/common/events/recorder\_mock.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi9ldmVudHMvcmVjb3JkZXJfbW9jay5nbw==) | `0.00% <0.00%> (ø)` | |
   | [pkg/common/resource.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi9yZXNvdXJjZS5nbw==) | `90.72% <0.00%> (-9.28%)` | :arrow_down: |
   | [pkg/common/utils/gang\_utils.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi91dGlscy9nYW5nX3V0aWxzLmdv) | `50.94% <0.00%> (-30.60%)` | :arrow_down: |
   | [pkg/controller/application/app\_controller.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbnRyb2xsZXIvYXBwbGljYXRpb24vYXBwX2NvbnRyb2xsZXIuZ28=) | `71.05% <ø> (-0.26%)` | :arrow_down: |
   | [...missioncontrollers/webhook/admission\_controller.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3BsdWdpbi9hZG1pc3Npb25jb250cm9sbGVycy93ZWJob29rL2FkbWlzc2lvbl9jb250cm9sbGVyLmdv) | `33.74% <0.00%> (+1.00%)` | :arrow_up: |
   | [pkg/shim/main.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NoaW0vbWFpbi5nbw==) | `0.00% <ø> (ø)` | |
   | ... and [24 more](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [db7e963...0dbafac](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/265?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #265: [YUNIKORN-677] Potential resource leak when complete and allocate pod happens simultaneously

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #265:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/265#issuecomment-844698325


   > The check we do for the allocation is in the before hook for task allocated. In that hook we check if the state is task completed. That looks good, however does that cover the fact that a task could already be deleted or is that not a possible case to cover?
   
   Yes, it should be covered. Today, when we delete a pod, that also triggers a CompleteTask event, so the task will be always in the Completed state. This reminds me that we need to do the proper cleanup for the apps in the shim side cache. We can do that when the app is in the terminated state on the core side, so we can safely remove it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org