You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/08/11 15:52:28 UTC

[GitHub] [incubator-yunikorn-core] chia7712 opened a new pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…

chia7712 opened a new pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299


   …nding pods
   
   ### What is this PR for?
   `GetPartitionQueues` calls read lock multiple times. If there is a thread which is waiting write lock, it can exclude all new read locks. In short, the following execution order can cause dead lock.
   
   1. hold read lock ---> thread 0
    2. wait write lock ---> thread 1 is locked by thread 0
    3. acquire read lock ---> thread 0 is locked by thread 1
   
   see docs for more details ([https://pkg.go.dev/sync#RWMutex])
   
   The pprof is shown below.
   ```
   1 @ 0x43ada5 0x44ca85 0x44ca6e 0x46de27 0x47ce25 0x47e590 0x47e522 0x9c5753 0x9ad231 0x9fbcfa 0x9e4986 0x9e4851 0x9de655 0x9ff792 0x471c61
   #	0x46de26	sync.runtime_SemacquireMutex+0x46									/Users/chia7712/Library/go/default/src/runtime/sema.go:71
   #	0x47ce24	sync.(*Mutex).lockSlow+0x104										/Users/chia7712/Library/go/default/src/sync/mutex.go:138
   #	0x47e58f	sync.(*Mutex).Lock+0x8f											/Users/chia7712/Library/go/default/src/sync/mutex.go:81
   #	0x47e521	sync.(*RWMutex).Lock+0x21										/Users/chia7712/Library/go/default/src/sync/rwmutex.go:111
   #	0x9c5752	github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).incPendingResource+0x52	/Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:454
   
   
   1 @ 0x43ada5 0x44ca85 0x44ca6e 0x46de27 0x9c7eae 0x9c7e34 0x9c51f8 0x9c54ab 0xa59e45 0xa59e94 0x711024 0xa5b310 0x711024 0xa011b3 0x7145e3 0x70fb0d 0x471c61
   #	0x46de26	sync.runtime_SemacquireMutex+0x46									/Users/chia7712/Library/go/default/src/runtime/sema.go:71
   #	0x9c7ead	sync.(*RWMutex).RLock+0xad										/Users/chia7712/Library/go/default/src/sync/rwmutex.go:63
   #	0x9c7e33	github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).IsLeafQueue+0x33		/Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:667
   #	0x9c51f7	github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).GetPartitionQueues+0x1f7	/Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:426
   #	0x9c54aa	github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).GetPartitionQueues+0x4aa	/Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:416
   ```
   
   
   ### What type of PR is it?
   * [x] - Bug Fix
   * [ ] - Improvement
   * [ ] - Feature
   * [ ] - Documentation
   * [ ] - Hot Fix
   * [ ] - Refactoring
   
   ### Todos
   * [ ] - Task
   
   ### What is the Jira issue?
   https://issues.apache.org/jira/browse/YUNIKORN-793
   
   ### How should this be tested?
   test it manually
   
   ### Screenshots (if appropriate)
   
   ### Questions:
   * [ ] - The licenses files need update.
   * [ ] - There is breaking changes for older versions.
   * [ ] - It needs documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] chia7712 commented on pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…

Posted by GitBox <gi...@apache.org>.
chia7712 commented on pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299#issuecomment-897385573


   > Wrong PR mentioned in the commit message.
   
   Maybe we can revert it and then re-commit it again? That can help us to trace the code in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] wilfred-s closed pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…

Posted by GitBox <gi...@apache.org>.
wilfred-s closed pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] codecov[bot] commented on pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299#issuecomment-896945536


   # [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#299](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9374bcf) into [master](https://codecov.io/gh/apache/incubator-yunikorn-core/commit/5a1c19e280f66d6cb4fc206a675116260d6f2ff1?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5a1c19e) will **increase** coverage by `1.96%`.
   > The diff coverage is `65.90%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/graphs/tree.svg?width=650&height=150&src=pr&token=SB9NrIi3Hy&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #299      +/-   ##
   ==========================================
   + Coverage   63.46%   65.42%   +1.96%     
   ==========================================
     Files          60       61       +1     
     Lines        5220     5710     +490     
   ==========================================
   + Hits         3313     3736     +423     
   - Misses       1747     1785      +38     
   - Partials      160      189      +29     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [pkg/common/utils.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi91dGlscy5nbw==) | `77.50% <0.00%> (ø)` | |
   | [pkg/metrics/init.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL21ldHJpY3MvaW5pdC5nbw==) | `62.96% <ø> (ø)` | |
   | [pkg/scheduler/context.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9jb250ZXh0Lmdv) | `5.68% <0.00%> (-0.58%)` | :arrow_down: |
   | [pkg/scheduler/nodes\_usage\_monitor.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9ub2Rlc191c2FnZV9tb25pdG9yLmdv) | `0.00% <0.00%> (ø)` | |
   | [pkg/scheduler/objects/allocation.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL2FsbG9jYXRpb24uZ28=) | `100.00% <ø> (ø)` | |
   | [pkg/webservice/webservice.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3dlYnNlcnZpY2Uvd2Vic2VydmljZS5nbw==) | `32.25% <ø> (+19.35%)` | :arrow_up: |
   | [pkg/metrics/metrics\_collector.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL21ldHJpY3MvbWV0cmljc19jb2xsZWN0b3IuZ28=) | `64.51% <45.00%> (-9.40%)` | :arrow_down: |
   | [pkg/scheduler/partition\_manager.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9wYXJ0aXRpb25fbWFuYWdlci5nbw==) | `20.00% <50.00%> (ø)` | |
   | [pkg/metrics/scheduler.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL21ldHJpY3Mvc2NoZWR1bGVyLmdv) | `60.79% <55.00%> (-0.79%)` | :arrow_down: |
   | [pkg/scheduler/objects/queue.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL3F1ZXVlLmdv) | `69.55% <62.85%> (+0.11%)` | :arrow_up: |
   | ... and [17 more](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [eaa6afb...9374bcf](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-yunikorn-core] wilfred-s commented on pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…

Posted by GitBox <gi...@apache.org>.
wilfred-s commented on pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299#issuecomment-897383275


   fixed in commit: eaaf8b6f3e171ec9fdf3606ab9249a0889132439
   Wrong PR mentioned in the commit message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org