You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "poorbarcode (via GitHub)" <gi...@apache.org> on 2023/04/26 16:18:30 UTC
[GitHub] [pulsar] poorbarcode opened a new pull request, #20189: [fix][broker]Fix deadlock of metadata store
poorbarcode opened a new pull request, #20189:
URL: https://github.com/apache/pulsar/pull/20189
### Motivation
In the method `loadOrCreatePersistentTopic,` it does `getBundles` after `LockManagerImpl.lambda$acquireLock`. This all runs on the thread `main-EventThread.` It will cause deadlock
```
"main-EventThread" #21 daemon prio=5 os_prio=0 cpu=1025.43ms elapsed=1113.10s tid=0x00007fb2b6514bf0 nid=0xd6 waiting on condition [0x00007fb2785dc000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.6/Native Method)
- parking to wait for <0x0000000487fda1a8> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.6/LockSupport.java:211)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.6/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.6/ForkJoinPool.java:3463)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.6/ForkJoinPool.java:3434)
at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.6/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.get(java.base@17.0.6/CompletableFuture.java:2072)
at com.github.benmanes.caffeine.cache.LocalAsyncCache$AbstractCacheView.resolve(LocalAsyncCache.java:515)
at com.github.benmanes.caffeine.cache.LocalAsyncLoadingCache$LoadingCacheView.get(LocalAsyncLoadingCache.java:122)
at org.apache.pulsar.common.naming.NamespaceBundleFactory.getBundles(NamespaceBundleFactory.java:260)
at org.apache.pulsar.broker.namespace.NamespaceService.getBundle(NamespaceService.java:219)
at org.apache.pulsar.broker.namespace.NamespaceService.isServiceUnitActiveAsync(NamespaceService.java:1020)
at org.apache.pulsar.broker.service.BrokerService.checkOwnershipAndCreatePersistentTopic(BrokerService.java:1423)
at org.apache.pulsar.broker.service.BrokerService.lambda$loadOrCreatePersistentTopic$53(BrokerService.java:1398)
at org.apache.pulsar.broker.service.BrokerService$$Lambda$1265/0x000000080141f980.run(Unknown Source)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(java.base@17.0.6/CompletableFuture.java:787)
at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.6/CompletableFuture.java:510)
at java.util.concurrent.CompletableFuture.complete(java.base@17.0.6/CompletableFuture.java:2147)
at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.lambda$acquireLock$1(LockManagerImpl.java:105)
at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl$$Lambda$604/0x00000008012086b8.run(Unknown Source)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(java.base@17.0.6/CompletableFuture.java:787)
at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.6/CompletableFuture.java:510)
at java.util.concurrent.CompletableFuture.complete(java.base@17.0.6/CompletableFuture.java:2147)
at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$acquire$2(ResourceLockImpl.java:128)
at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl$$Lambda$602/0x0000000801208248.run(Unknown Source)
at java.util.concurrent.CompletableFuture$UniRun.tryFire(java.base@17.0.6/CompletableFuture.java:787)
at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.6/CompletableFuture.java:510)
at java.util.concurrent.CompletableFuture.complete(java.base@17.0.6/CompletableFuture.java:2147)
at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$acquireWithNoRevalidation$6(ResourceLockImpl.java:167)
at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl$$Lambda$600/0x0000000801203400.accept(Unknown Source)
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(java.base@17.0.6/CompletableFuture.java:718)
at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.6/CompletableFuture.java:510)
at java.util.concurrent.CompletableFuture.complete(java.base@17.0.6/CompletableFuture.java:2147)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.handlePutResult(ZKMetadataStore.java:225)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:182)
at org.apache.pulsar.metadata.impl.ZKMetadataStore$$Lambda$224/0x0000000800ee9d20.processResult(Unknown Source)
at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563)
```
### Modifications
After the operations of `store`, switch to another thread.
### Documentation
<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
- [ ] `doc` <!-- Your PR contains doc changes. -->
- [x] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
### Matching PR in forked repository
PR in forked repository:
- 1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] Technoboy- closed pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "Technoboy- (via GitHub)" <gi...@apache.org>.
Technoboy- closed pull request #20189: [fix][broker]Fix deadlock of metadata store
URL: https://github.com/apache/pulsar/pull/20189
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] poorbarcode merged pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "poorbarcode (via GitHub)" <gi...@apache.org>.
poorbarcode merged PR #20189:
URL: https://github.com/apache/pulsar/pull/20189
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] lhotari commented on a diff in pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on code in PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#discussion_r1192249690
##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java:
##########
@@ -1137,6 +1137,10 @@ public CompletableFuture<Boolean> isServiceUnitOwnedAsync(ServiceUnitId suName)
new IllegalArgumentException("Invalid class of NamespaceBundle: " + suName.getClass().getName()));
}
+ /**
+ * @Deprecated This method is only used by test. call "isServiceUnitActiveAsync" is better.
+ */
+ @Deprecated
public boolean isServiceUnitActive(TopicName topicName) {
try {
Review Comment:
would it make sense to use something like `isServiceUnitActiveAsync(topicName)..get(conf.getMetadataStoreOperationTimeoutSeconds(), SECONDS);`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] poorbarcode commented on pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "poorbarcode (via GitHub)" <gi...@apache.org>.
poorbarcode commented on PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#issuecomment-1540543199
@lhotari
> is this related to https://github.com/apache/pulsar/pull/20130 changes and review comments in any way?
Sorry, I didn't notice this PR before, but I sense that the two problems are very similar.
This PR is not related to https://github.com/apache/pulsar/pull/20130.
https://github.com/apache/pulsar/pull/20130 is used to solve the deadlock of the `meta-store` thread, and PR is used to solve the deadlock of the `event-zk-client` thread.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] lhotari commented on pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#issuecomment-1540454900
@poorbarcode @Technoboy- is this related to #20130 changes and review comments in any way?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] poorbarcode commented on pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "poorbarcode (via GitHub)" <gi...@apache.org>.
poorbarcode commented on PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#issuecomment-1540546951
Now the current modification is problematic because the `OwnershipCache.cache.executor` uses the current thread. see
- https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java#L133-L126
I will fix it tomorrow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] codecov-commenter commented on pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#issuecomment-1545401227
## [Codecov](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#20189](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f686e1f) into [master](https://app.codecov.io/gh/apache/pulsar/commit/00f17e88ad8fa9a4ead78d78a181c03fdee8e4df?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (00f17e8) will **increase** coverage by `35.37%`.
> The diff coverage is `40.00%`.
[![Impacted file tree graph](https://app.codecov.io/gh/apache/pulsar/pull/20189/graphs/tree.svg?width=650&height=150&src=pr&token=acYqCpsK9J&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #20189 +/- ##
=============================================
+ Coverage 37.61% 72.98% +35.37%
- Complexity 12589 31971 +19382
=============================================
Files 1691 1868 +177
Lines 129028 138604 +9576
Branches 14066 15240 +1174
=============================================
+ Hits 48530 101166 +52636
+ Misses 74183 29399 -44784
- Partials 6315 8039 +1724
```
| Flag | Coverage Δ | |
|---|---|---|
| inttests | `24.12% <40.00%> (-0.06%)` | :arrow_down: |
| systests | `25.00% <40.00%> (+0.23%)` | :arrow_up: |
| unittests | `72.24% <40.00%> (+39.05%)` | :arrow_up: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...apache/pulsar/broker/namespace/OwnershipCache.java](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9uYW1lc3BhY2UvT3duZXJzaGlwQ2FjaGUuamF2YQ==) | `85.26% <ø> (+10.52%)` | :arrow_up: |
| [...ache/pulsar/broker/namespace/NamespaceService.java](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9uYW1lc3BhY2UvTmFtZXNwYWNlU2VydmljZS5qYXZh) | `69.75% <40.00%> (+25.99%)` | :arrow_up: |
... and [1426 files with indirect coverage changes](https://app.codecov.io/gh/apache/pulsar/pull/20189/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] poorbarcode commented on a diff in pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "poorbarcode (via GitHub)" <gi...@apache.org>.
poorbarcode commented on code in PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#discussion_r1192410763
##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java:
##########
@@ -1137,6 +1137,10 @@ public CompletableFuture<Boolean> isServiceUnitOwnedAsync(ServiceUnitId suName)
new IllegalArgumentException("Invalid class of NamespaceBundle: " + suName.getClass().getName()));
}
+ /**
+ * @Deprecated This method is only used by test. call "isServiceUnitActiveAsync" is better.
+ */
+ @Deprecated
public boolean isServiceUnitActive(TopicName topicName) {
try {
Review Comment:
@lhotari
I think it is a good suggestion. Already edit this method to make the test that calls the method works better.
Could you take a look again?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] lhotari commented on a diff in pull request #20189: [fix][broker]Fix deadlock of metadata store
Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on code in PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#discussion_r1192249690
##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java:
##########
@@ -1137,6 +1137,10 @@ public CompletableFuture<Boolean> isServiceUnitOwnedAsync(ServiceUnitId suName)
new IllegalArgumentException("Invalid class of NamespaceBundle: " + suName.getClass().getName()));
}
+ /**
+ * @Deprecated This method is only used by test. call "isServiceUnitActiveAsync" is better.
+ */
+ @Deprecated
public boolean isServiceUnitActive(TopicName topicName) {
try {
Review Comment:
would it make sense to use something like `isServiceUnitActiveAsync(topicName).get(conf.getMetadataStoreOperationTimeoutSeconds(), SECONDS);`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org