You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2021/08/11 15:52:28 UTC
[GitHub] [incubator-yunikorn-core] chia7712 opened a new pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…
chia7712 opened a new pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299
…nding pods
### What is this PR for?
`GetPartitionQueues` calls read lock multiple times. If there is a thread which is waiting write lock, it can exclude all new read locks. In short, the following execution order can cause dead lock.
1. hold read lock ---> thread 0
2. wait write lock ---> thread 1 is locked by thread 0
3. acquire read lock ---> thread 0 is locked by thread 1
see docs for more details ([https://pkg.go.dev/sync#RWMutex])
The pprof is shown below.
```
1 @ 0x43ada5 0x44ca85 0x44ca6e 0x46de27 0x47ce25 0x47e590 0x47e522 0x9c5753 0x9ad231 0x9fbcfa 0x9e4986 0x9e4851 0x9de655 0x9ff792 0x471c61
# 0x46de26 sync.runtime_SemacquireMutex+0x46 /Users/chia7712/Library/go/default/src/runtime/sema.go:71
# 0x47ce24 sync.(*Mutex).lockSlow+0x104 /Users/chia7712/Library/go/default/src/sync/mutex.go:138
# 0x47e58f sync.(*Mutex).Lock+0x8f /Users/chia7712/Library/go/default/src/sync/mutex.go:81
# 0x47e521 sync.(*RWMutex).Lock+0x21 /Users/chia7712/Library/go/default/src/sync/rwmutex.go:111
# 0x9c5752 github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).incPendingResource+0x52 /Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:454
1 @ 0x43ada5 0x44ca85 0x44ca6e 0x46de27 0x9c7eae 0x9c7e34 0x9c51f8 0x9c54ab 0xa59e45 0xa59e94 0x711024 0xa5b310 0x711024 0xa011b3 0x7145e3 0x70fb0d 0x471c61
# 0x46de26 sync.runtime_SemacquireMutex+0x46 /Users/chia7712/Library/go/default/src/runtime/sema.go:71
# 0x9c7ead sync.(*RWMutex).RLock+0xad /Users/chia7712/Library/go/default/src/sync/rwmutex.go:63
# 0x9c7e33 github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).IsLeafQueue+0x33 /Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:667
# 0x9c51f7 github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).GetPartitionQueues+0x1f7 /Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:426
# 0x9c54aa github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Queue).GetPartitionQueues+0x4aa /Users/chia7712/go/pkg/mod/github.com/chia7712/incubator-yunikorn-core@v0.0.0-20210811001640-eaa6afb10b62/pkg/scheduler/objects/queue.go:416
```
### What type of PR is it?
* [x] - Bug Fix
* [ ] - Improvement
* [ ] - Feature
* [ ] - Documentation
* [ ] - Hot Fix
* [ ] - Refactoring
### Todos
* [ ] - Task
### What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-793
### How should this be tested?
test it manually
### Screenshots (if appropriate)
### Questions:
* [ ] - The licenses files need update.
* [ ] - There is breaking changes for older versions.
* [ ] - It needs documentation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-yunikorn-core] chia7712 commented on pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…
Posted by GitBox <gi...@apache.org>.
chia7712 commented on pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299#issuecomment-897385573
> Wrong PR mentioned in the commit message.
Maybe we can revert it and then re-commit it again? That can help us to trace the code in the future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-yunikorn-core] wilfred-s closed pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…
Posted by GitBox <gi...@apache.org>.
wilfred-s closed pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-yunikorn-core] codecov[bot] commented on pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…
Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299#issuecomment-896945536
# [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#299](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9374bcf) into [master](https://codecov.io/gh/apache/incubator-yunikorn-core/commit/5a1c19e280f66d6cb4fc206a675116260d6f2ff1?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5a1c19e) will **increase** coverage by `1.96%`.
> The diff coverage is `65.90%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/graphs/tree.svg?width=650&height=150&src=pr&token=SB9NrIi3Hy&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #299 +/- ##
==========================================
+ Coverage 63.46% 65.42% +1.96%
==========================================
Files 60 61 +1
Lines 5220 5710 +490
==========================================
+ Hits 3313 3736 +423
- Misses 1747 1785 +38
- Partials 160 189 +29
```
| [Impacted Files](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [pkg/common/utils.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL2NvbW1vbi91dGlscy5nbw==) | `77.50% <0.00%> (ø)` | |
| [pkg/metrics/init.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL21ldHJpY3MvaW5pdC5nbw==) | `62.96% <ø> (ø)` | |
| [pkg/scheduler/context.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9jb250ZXh0Lmdv) | `5.68% <0.00%> (-0.58%)` | :arrow_down: |
| [pkg/scheduler/nodes\_usage\_monitor.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9ub2Rlc191c2FnZV9tb25pdG9yLmdv) | `0.00% <0.00%> (ø)` | |
| [pkg/scheduler/objects/allocation.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL2FsbG9jYXRpb24uZ28=) | `100.00% <ø> (ø)` | |
| [pkg/webservice/webservice.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3dlYnNlcnZpY2Uvd2Vic2VydmljZS5nbw==) | `32.25% <ø> (+19.35%)` | :arrow_up: |
| [pkg/metrics/metrics\_collector.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL21ldHJpY3MvbWV0cmljc19jb2xsZWN0b3IuZ28=) | `64.51% <45.00%> (-9.40%)` | :arrow_down: |
| [pkg/scheduler/partition\_manager.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9wYXJ0aXRpb25fbWFuYWdlci5nbw==) | `20.00% <50.00%> (ø)` | |
| [pkg/metrics/scheduler.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL21ldHJpY3Mvc2NoZWR1bGVyLmdv) | `60.79% <55.00%> (-0.79%)` | :arrow_down: |
| [pkg/scheduler/objects/queue.go](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGtnL3NjaGVkdWxlci9vYmplY3RzL3F1ZXVlLmdv) | `69.55% <62.85%> (+0.11%)` | :arrow_up: |
| ... and [17 more](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [eaa6afb...9374bcf](https://codecov.io/gh/apache/incubator-yunikorn-core/pull/299?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-yunikorn-core] wilfred-s commented on pull request #299: YUNIKORN-793 fix deadlock caused by listing queues with scheduling pe…
Posted by GitBox <gi...@apache.org>.
wilfred-s commented on pull request #299:
URL: https://github.com/apache/incubator-yunikorn-core/pull/299#issuecomment-897383275
fixed in commit: eaaf8b6f3e171ec9fdf3606ab9249a0889132439
Wrong PR mentioned in the commit message.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org