You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/06/14 23:26:38 UTC

[GitHub] [incubator-yunikorn-k8shim] ycr-oss opened a new pull request #276: Abort deleting resources when the scheduler is still running

ycr-oss opened a new pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276


   ### What is this PR for?
   When upgrading YuniKorn with `helm upgrade`, the admission controller is deleted, an example of what happens:
   ```
   $ k get po -w
   NAME                                             READY   STATUS    RESTARTS   AGE
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   1/1     Running   0          2d5h
   yunikorn-scheduler-64f8746476-4mqjl              2/2     Running   0          2d5h
   yunikorn-scheduler-dcdd789b4-rs4km               0/2     Pending   0          0s
   yunikorn-scheduler-dcdd789b4-rs4km               0/2     Pending   0          0s
   yunikorn-scheduler-dcdd789b4-rs4km               0/2     ContainerCreating   0          0s
   yunikorn-scheduler-dcdd789b4-rs4km               2/2     Running             0          6s
   yunikorn-scheduler-64f8746476-4mqjl              2/2     Terminating         0          2d5h
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   1/1     Terminating         0          2d5h
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   0/1     Terminating         0          2d5h
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   0/1     Terminating         0          2d5h
   yunikorn-admission-controller-6f6b9b8dff-4pqv6   0/1     Terminating         0          2d5h
   yunikorn-scheduler-64f8746476-4mqjl              0/2     Terminating         0          2d5h
   yunikorn-scheduler-64f8746476-4mqjl              0/2     Terminating         0          2d5h
   yunikorn-scheduler-64f8746476-4mqjl              0/2     Terminating         0          2d5h
   ```
   As shown above, when the upgrade is complete, only the scheduler pod remains but the admission controller pod is gone. This is due to the postStop hook deleting the admission controller. This PR added a simple check in this deletion step: if the scheduler is still healthy and running, don't delete.
   
   Note that after this change, commands like
   ```
   kubectl scale deployment yunikorn-scheduler --replicas=[0|1]
   ```
   still work exactly as before
   
   ### What type of PR is it?
   * [ ] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-696
   
   ### How should this be tested?
   Install YuniKorn, then upgrade it with `helm upgrade`, check the right pods are running.
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] ycr-oss commented on pull request #276: Abort deleting resources when the scheduler is still running

Posted by GitBox <gi...@apache.org>.
ycr-oss commented on pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276#issuecomment-864666817


   Closing this PR as I'll make another one at the release repo. https://issues.apache.org/jira/browse/YUNIKORN-710 created to track the doc work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #276: Abort deleting resources when the scheduler is still running

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276#issuecomment-864734927


   That's correct. I also noticed the same. I think after this change, we should document this and let the users use `helm install/delete/upgrade` to manage YK. And command `scale deployment` will only be used if the user wants to restart YK.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] ycr-oss closed pull request #276: Abort deleting resources when the scheduler is still running

Posted by GitBox <gi...@apache.org>.
ycr-oss closed pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] commented on pull request #276: Abort deleting resources when the scheduler is still running

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276#issuecomment-861060284


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#276](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d34eb4e) into [master](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/commit/013180f3c39182e61f8a2604525096e77d9b5f15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (013180f) will **increase** coverage by `4.54%`.
   > The diff coverage is `76.93%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/graphs/tree.svg?width=650&height=150&src=pr&token=LZImIuvleR&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #276      +/-   ##
   ==========================================
   + Coverage   58.23%   62.78%   +4.54%     
   ==========================================
     Files          33       37       +4     
     Lines        2931     3386     +455     
   ==========================================
   + Hits         1707     2126     +419     
   - Misses       1152     1176      +24     
   - Partials       72       84      +12     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [pkg/appmgmt/appmgmt\_recovery.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2FwcG1nbXQvYXBwbWdtdF9yZWNvdmVyeS5nbw==) | `67.50% <0.00%> (-8.18%)` | :arrow_down: |
   | [pkg/cache/amprotocol\_mock.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2FtcHJvdG9jb2xfbW9jay5nbw==) | `0.00% <0.00%> (ø)` | |
   | [pkg/cache/context\_recovery.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2NvbnRleHRfcmVjb3ZlcnkuZ28=) | `45.78% <0.00%> (-0.56%)` | :arrow_down: |
   | [pkg/cache/nodes.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL25vZGVzLmdv) | `79.80% <ø> (ø)` | |
   | [pkg/common/events/recorder\_mock.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi9ldmVudHMvcmVjb3JkZXJfbW9jay5nbw==) | `0.00% <0.00%> (ø)` | |
   | [pkg/common/events/states.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi9ldmVudHMvc3RhdGVzLmdv) | `0.00% <0.00%> (ø)` | |
   | [pkg/common/resource.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi9yZXNvdXJjZS5nbw==) | `90.72% <0.00%> (-9.28%)` | :arrow_down: |
   | [pkg/controller/application/app\_controller.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbnRyb2xsZXIvYXBwbGljYXRpb24vYXBwX2NvbnRyb2xsZXIuZ28=) | `71.05% <ø> (-0.26%)` | :arrow_down: |
   | [...missioncontrollers/webhook/admission\_controller.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3BsdWdpbi9hZG1pc3Npb25jb250cm9sbGVycy93ZWJob29rL2FkbWlzc2lvbl9jb250cm9sbGVyLmdv) | `33.74% <0.00%> (+1.57%)` | :arrow_up: |
   | [pkg/shim/main.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NoaW0vbWFpbi5nbw==) | `0.00% <ø> (ø)` | |
   | ... and [27 more](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d8774bc...d34eb4e](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/276?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] ycr-oss commented on pull request #276: Abort deleting resources when the scheduler is still running

Posted by GitBox <gi...@apache.org>.
ycr-oss commented on pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276#issuecomment-865019356


   Sounds good. I'll take care of it in the documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] ycr-oss commented on pull request #276: Abort deleting resources when the scheduler is still running

Posted by GitBox <gi...@apache.org>.
ycr-oss commented on pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276#issuecomment-864273145


   @yangwwei Thanks for the review! Your suggestion sounds good. I'll make the changes next week and also create a JIRA to track the work on documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] ycr-oss commented on pull request #276: Abort deleting resources when the scheduler is still running

Posted by GitBox <gi...@apache.org>.
ycr-oss commented on pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276#issuecomment-864672128


   @yangwwei I just realized one potential issue with the suggested approach. If we remove the `preStop` lifecycle hook, then when running the command:
   ```
   kubectl scale deployment yunikorn-scheduler --replicas=0
   ```
   will disable the scheduler pod but leave the admission controller running, causing all incoming pods to be patched with `schedulerName=yunikorn` but they can't be scheduled because the scheduler pod is down


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] ycr-oss commented on pull request #276: Abort deleting resources when the scheduler is still running

Posted by GitBox <gi...@apache.org>.
ycr-oss commented on pull request #276:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/276#issuecomment-865019356


   Sounds good. I'll take care of it in the documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org