You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/08/18 16:00:40 UTC

[GitHub] [incubator-yunikorn-k8shim] chenya-zhang opened a new pull request #294: [YUNIKORN-703]Reduce scheduler recovery timeout and terminate failed container

chenya-zhang opened a new pull request #294:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/294


   ### What is this PR for?
   
   Context:
   YuniKorn scheduler restart failed multiple times affecting our production workload. 
   During recovery no nodes are added to the scheduler cache.
   
   Potential Causes: 
   YuniKorn scheduler restart timeout is set very long (> 6min every time) while K8s API server responds with exponential backoffs. Failed scheduler container does not terminate.
   
   Current Solutions: 
   Reduce scheduler recovery timeout and terminate failed scheduler container
   
   ### What type of PR is it?
   * [X] - Bug Fix
   * [X] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   
   https://issues.apache.org/jira/browse/YUNIKORN-703
   
   ### How should this be tested?
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] codecov[bot] commented on pull request #294: [YUNIKORN-703] Reduce scheduler recovery timeout and terminate failed container

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #294:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/294#issuecomment-902860598


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#294](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c613a11) into [master](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/commit/c47ed51f075c5af5910f71da40e7e68699a9abae?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c47ed51) will **increase** coverage by `3.17%`.
   > The diff coverage is `77.21%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/graphs/tree.svg?width=650&height=150&src=pr&token=LZImIuvleR&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #294      +/-   ##
   ==========================================
   + Coverage   59.75%   62.93%   +3.17%     
   ==========================================
     Files          35       37       +2     
     Lines        3133     3453     +320     
   ==========================================
   + Hits         1872     2173     +301     
   - Misses       1180     1192      +12     
   - Partials       81       88       +7     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [pkg/appmgmt/appmgmt\_recovery.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2FwcG1nbXQvYXBwbWdtdF9yZWNvdmVyeS5nbw==) | `67.50% <0.00%> (-8.18%)` | :arrow_down: |
   | [pkg/cache/amprotocol\_mock.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2FtcHJvdG9jb2xfbW9jay5nbw==) | `0.00% <0.00%> (ø)` | |
   | [pkg/cache/context\_recovery.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2NvbnRleHRfcmVjb3ZlcnkuZ28=) | `45.23% <0.00%> (-1.11%)` | :arrow_down: |
   | [pkg/cache/external/scheduler\_cache.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL2V4dGVybmFsL3NjaGVkdWxlcl9jYWNoZS5nbw==) | `34.56% <0.00%> (+0.40%)` | :arrow_up: |
   | [pkg/cache/nodes.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NhY2hlL25vZGVzLmdv) | `79.80% <ø> (ø)` | |
   | [pkg/common/events/recorder\_mock.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi9ldmVudHMvcmVjb3JkZXJfbW9jay5nbw==) | `0.00% <0.00%> (ø)` | |
   | [pkg/common/events/states.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi9ldmVudHMvc3RhdGVzLmdv) | `0.00% <0.00%> (ø)` | |
   | [pkg/controller/application/app\_controller.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbnRyb2xsZXIvYXBwbGljYXRpb24vYXBwX2NvbnRyb2xsZXIuZ28=) | `71.05% <ø> (-0.26%)` | :arrow_down: |
   | [...missioncontrollers/webhook/admission\_controller.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3BsdWdpbi9hZG1pc3Npb25jb250cm9sbGVycy93ZWJob29rL2FkbWlzc2lvbl9jb250cm9sbGVyLmdv) | `33.74% <0.00%> (+1.00%)` | :arrow_up: |
   | [pkg/shim/main.go](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NoaW0vbWFpbi5nbw==) | `0.00% <ø> (ø)` | |
   | ... and [30 more](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0bd9366...c613a11](https://codecov.io/gh/apache/incubator-yunikorn-k8shim/pull/294?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei merged pull request #294: [YUNIKORN-703] Reduce scheduler recovery timeout and terminate failed container

Posted by GitBox <gi...@apache.org>.
yangwwei merged pull request #294:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/294


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-k8shim] yangwwei commented on pull request #294: [YUNIKORN-703] Reduce scheduler recovery timeout and terminate failed container

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #294:
URL: https://github.com/apache/incubator-yunikorn-k8shim/pull/294#issuecomment-902859211


   I helped to rebase this based on #296 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org