You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2020/06/12 07:26:22 UTC

[GitHub] [incubator-yunikorn-core] wilfred-s opened a new pull request #168: [YUNIKORN-222] Race in state change

wilfred-s opened a new pull request #168:
URL: https://github.com/apache/incubator-yunikorn-core/pull/168


   If the application allocation is not confirmed before the ask is updated
   the state is changed from accepted to running (via waiting) instead of
   accepted to starting. This breaks the stateaware scheduling policy of a
   queue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] codecov-commenter commented on pull request #168: [YUNIKORN-222] Race in state change

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #168:
URL: https://github.com/apache/incubator-yunikorn-core/pull/168#issuecomment-643408114


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168?src=pr&el=h1) Report
   > Merging [#168](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-yunikorn-core/commit/f4ea4502a52e06e64e5abd0c47fedf2e08fe12e3&el=desc) will **decrease** coverage by `0.00%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168/graphs/tree.svg?width=650&height=150&src=pr&token=SB9NrIi3Hy)](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #168      +/-   ##
   ==========================================
   - Coverage   58.95%   58.95%   -0.01%     
   ==========================================
     Files          62       62              
     Lines        6208     6200       -8     
   ==========================================
   - Hits         3660     3655       -5     
   + Misses       2406     2404       -2     
   + Partials      142      141       -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [pkg/cache/cachetestutils.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168/diff?src=pr&el=tree#diff-cGtnL2NhY2hlL2NhY2hldGVzdHV0aWxzLmdv) | `61.76% <0.00%> (-3.87%)` | :arrow_down: |
   | [pkg/scheduler/scheduling\_application.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168/diff?src=pr&el=tree#diff-cGtnL3NjaGVkdWxlci9zY2hlZHVsaW5nX2FwcGxpY2F0aW9uLmdv) | `73.86% <75.00%> (+0.61%)` | :arrow_up: |
   | [pkg/common/configs/configwatcher.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168/diff?src=pr&el=tree#diff-cGtnL2NvbW1vbi9jb25maWdzL2NvbmZpZ3dhdGNoZXIuZ28=) | `85.48% <0.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168?src=pr&el=footer). Last update [f4ea450...29c5f06](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/168?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] yangwwei commented on pull request #168: [YUNIKORN-222] Race in state change

Posted by GitBox <gi...@apache.org>.
yangwwei commented on pull request #168:
URL: https://github.com/apache/incubator-yunikorn-core/pull/168#issuecomment-643475208


   hi @wilfred-s 
   
   The change itself looks good. But I do not fully understand why this causes the problem.
   From `Accepted` to `Waiting`, this is triggered when `removeAllocationAsk ` is called, which is called when there is a container released. But when we run a spark job, when the driver is allocated, the job is at `Accepted` state, then there will be other executors adding to this job. there is no release action could happen here, why it would go to `Waiting` and `Running` then?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] wilfred-s closed pull request #168: [YUNIKORN-222] Race in state change

Posted by GitBox <gi...@apache.org>.
wilfred-s closed pull request #168:
URL: https://github.com/apache/incubator-yunikorn-core/pull/168


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] TravisBuddy commented on pull request #168: [YUNIKORN-222] Race in state change

Posted by GitBox <gi...@apache.org>.
TravisBuddy commented on pull request #168:
URL: https://github.com/apache/incubator-yunikorn-core/pull/168#issuecomment-643408055


   Hey @wilfred-s,  
   Your changes look good to me!
   
   <a href="https:&#x2F;&#x2F;travis-ci.org&#x2F;apache&#x2F;incubator-yunikorn-core&#x2F;builds&#x2F;697518858">View build log</a>
   
   ###### TravisBuddy Request Identifier: 9c32d510-acd5-11ea-8245-836d9da6cb4c
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] yangwwei edited a comment on pull request #168: [YUNIKORN-222] Race in state change

Posted by GitBox <gi...@apache.org>.
yangwwei edited a comment on pull request #168:
URL: https://github.com/apache/incubator-yunikorn-core/pull/168#issuecomment-643475208


   hi @wilfred-s 
   
   The change itself looks good. But I do not fully understand why this causes the problem.
   From `Accepted` to `Waiting`, this is triggered when `removeAllocationAsk ` is called, which is called when there is a container released. But when we run a spark job, when the driver is allocated, the job is at `Accepted` state, then there will be other executors adding to this job. there is no release action could happen here, why it would go to `Waiting` and `Running` then?
   
   Can you please elaborate more on what was happening? An example will be helpful for us to understand what was going on. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] wilfred-s commented on pull request #168: [YUNIKORN-222] Race in state change

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on pull request #168:
URL: https://github.com/apache/incubator-yunikorn-core/pull/168#issuecomment-644503704


   Spoken off-line with Weiwei, merging the change as he approved


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] wilfred-s commented on pull request #168: [YUNIKORN-222] Race in state change

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on pull request #168:
URL: https://github.com/apache/incubator-yunikorn-core/pull/168#issuecomment-643877401


   The race is linked to when we update the repeat on the ask. I removed the transition check from the repeat update. When we create a proposal we lower the repeat, if the cache pushes back we cancel the proposal and increase the ask again. During proposal processing we also change the pending, inflight and allocated values. These changes race with each other because the cache is leading for an allocation while the scheduler is leading for all other.
   The scheduler does not get told that the allocation is confirmed until after the fact.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org