You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "poorbarcode (via GitHub)" <gi...@apache.org> on 2023/04/26 16:18:30 UTC

[GitHub] [pulsar] poorbarcode opened a new pull request, #20189: [fix][broker]Fix deadlock of metadata store

poorbarcode opened a new pull request, #20189:
URL: https://github.com/apache/pulsar/pull/20189

   ### Motivation
   
   In the method `loadOrCreatePersistentTopic,` it does `getBundles` after `LockManagerImpl.lambda$acquireLock`. This all runs on the thread `main-EventThread.` It will cause deadlock
   
   ```
   "main-EventThread" #21 daemon prio=5 os_prio=0 cpu=1025.43ms elapsed=1113.10s tid=0x00007fb2b6514bf0 nid=0xd6 waiting on condition  [0x00007fb2785dc000]
      java.lang.Thread.State: WAITING (parking)
   	at jdk.internal.misc.Unsafe.park(java.base@17.0.6/Native Method)
   	- parking to wait for  <0x0000000487fda1a8> (a java.util.concurrent.CompletableFuture$Signaller)
   	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.6/LockSupport.java:211)
   	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.6/CompletableFuture.java:1864)
   	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.6/ForkJoinPool.java:3463)
   	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.6/ForkJoinPool.java:3434)
   	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.6/CompletableFuture.java:1898)
   	at java.util.concurrent.CompletableFuture.get(java.base@17.0.6/CompletableFuture.java:2072)
   	at com.github.benmanes.caffeine.cache.LocalAsyncCache$AbstractCacheView.resolve(LocalAsyncCache.java:515)
   	at com.github.benmanes.caffeine.cache.LocalAsyncLoadingCache$LoadingCacheView.get(LocalAsyncLoadingCache.java:122)
   	at org.apache.pulsar.common.naming.NamespaceBundleFactory.getBundles(NamespaceBundleFactory.java:260)
   	at org.apache.pulsar.broker.namespace.NamespaceService.getBundle(NamespaceService.java:219)
   	at org.apache.pulsar.broker.namespace.NamespaceService.isServiceUnitActiveAsync(NamespaceService.java:1020)
   	at org.apache.pulsar.broker.service.BrokerService.checkOwnershipAndCreatePersistentTopic(BrokerService.java:1423)
   	at org.apache.pulsar.broker.service.BrokerService.lambda$loadOrCreatePersistentTopic$53(BrokerService.java:1398)
   	at org.apache.pulsar.broker.service.BrokerService$$Lambda$1265/0x000000080141f980.run(Unknown Source)
   	at java.util.concurrent.CompletableFuture$UniRun.tryFire(java.base@17.0.6/CompletableFuture.java:787)
   	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.6/CompletableFuture.java:510)
   	at java.util.concurrent.CompletableFuture.complete(java.base@17.0.6/CompletableFuture.java:2147)
   	at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.lambda$acquireLock$1(LockManagerImpl.java:105)
   	at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl$$Lambda$604/0x00000008012086b8.run(Unknown Source)
   	at java.util.concurrent.CompletableFuture$UniRun.tryFire(java.base@17.0.6/CompletableFuture.java:787)
   	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.6/CompletableFuture.java:510)
   	at java.util.concurrent.CompletableFuture.complete(java.base@17.0.6/CompletableFuture.java:2147)
   	at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$acquire$2(ResourceLockImpl.java:128)
   	at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl$$Lambda$602/0x0000000801208248.run(Unknown Source)
   	at java.util.concurrent.CompletableFuture$UniRun.tryFire(java.base@17.0.6/CompletableFuture.java:787)
   	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.6/CompletableFuture.java:510)
   	at java.util.concurrent.CompletableFuture.complete(java.base@17.0.6/CompletableFuture.java:2147)
   	at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$acquireWithNoRevalidation$6(ResourceLockImpl.java:167)
   	at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl$$Lambda$600/0x0000000801203400.accept(Unknown Source)
   	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(java.base@17.0.6/CompletableFuture.java:718)
   	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.6/CompletableFuture.java:510)
   	at java.util.concurrent.CompletableFuture.complete(java.base@17.0.6/CompletableFuture.java:2147)
   	at org.apache.pulsar.metadata.impl.ZKMetadataStore.handlePutResult(ZKMetadataStore.java:225)
   	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:182)
   	at org.apache.pulsar.metadata.impl.ZKMetadataStore$$Lambda$224/0x0000000800ee9d20.processResult(Unknown Source)
   	at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$3$1.processResult(PulsarZooKeeperClient.java:490)
   	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:722)
   	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563)
   ```
   
   ### Modifications
   
   After the operations of `store`, switch to another thread.
   
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [x] `doc-required` <!-- Your PR changes impact docs and you will update later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: 
   - 1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Technoboy- closed pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "Technoboy- (via GitHub)" <gi...@apache.org>.
Technoboy- closed pull request #20189: [fix][broker]Fix deadlock of metadata store
URL: https://github.com/apache/pulsar/pull/20189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] poorbarcode merged pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "poorbarcode (via GitHub)" <gi...@apache.org>.
poorbarcode merged PR #20189:
URL: https://github.com/apache/pulsar/pull/20189


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] lhotari commented on a diff in pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on code in PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#discussion_r1192249690


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java:
##########
@@ -1137,6 +1137,10 @@ public CompletableFuture<Boolean> isServiceUnitOwnedAsync(ServiceUnitId suName)
                 new IllegalArgumentException("Invalid class of NamespaceBundle: " + suName.getClass().getName()));
     }
 
+    /**
+     * @Deprecated This method is only used by test. call "isServiceUnitActiveAsync" is better.
+     */
+    @Deprecated
     public boolean isServiceUnitActive(TopicName topicName) {
         try {

Review Comment:
   would it make sense to use something like `isServiceUnitActiveAsync(topicName)..get(conf.getMetadataStoreOperationTimeoutSeconds(), SECONDS);`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] poorbarcode commented on pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "poorbarcode (via GitHub)" <gi...@apache.org>.
poorbarcode commented on PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#issuecomment-1540543199

   @lhotari 
   
   > is this related to https://github.com/apache/pulsar/pull/20130 changes and review comments in any way?
   
   Sorry, I didn't notice this PR before, but I sense that the two problems are very similar.
   
   This PR is not related to https://github.com/apache/pulsar/pull/20130.
   
   https://github.com/apache/pulsar/pull/20130 is used to solve the deadlock of the `meta-store` thread, and PR is used to solve the deadlock of the `event-zk-client` thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] lhotari commented on pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#issuecomment-1540454900

   @poorbarcode @Technoboy-  is this related to #20130 changes and review comments in any way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] poorbarcode commented on pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "poorbarcode (via GitHub)" <gi...@apache.org>.
poorbarcode commented on PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#issuecomment-1540546951

   Now the current modification is problematic because the `OwnershipCache.cache.executor` uses the current thread. see 
   - https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java#L133-L126
   
   I will fix it tomorrow
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codecov-commenter commented on pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#issuecomment-1545401227

   ## [Codecov](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#20189](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f686e1f) into [master](https://app.codecov.io/gh/apache/pulsar/commit/00f17e88ad8fa9a4ead78d78a181c03fdee8e4df?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (00f17e8) will **increase** coverage by `35.37%`.
   > The diff coverage is `40.00%`.
   
   [![Impacted file tree graph](https://app.codecov.io/gh/apache/pulsar/pull/20189/graphs/tree.svg?width=650&height=150&src=pr&token=acYqCpsK9J&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #20189       +/-   ##
   =============================================
   + Coverage     37.61%   72.98%   +35.37%     
   - Complexity    12589    31971    +19382     
   =============================================
     Files          1691     1868      +177     
     Lines        129028   138604     +9576     
     Branches      14066    15240     +1174     
   =============================================
   + Hits          48530   101166    +52636     
   + Misses        74183    29399    -44784     
   - Partials       6315     8039     +1724     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | inttests | `24.12% <40.00%> (-0.06%)` | :arrow_down: |
   | systests | `25.00% <40.00%> (+0.23%)` | :arrow_up: |
   | unittests | `72.24% <40.00%> (+39.05%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...apache/pulsar/broker/namespace/OwnershipCache.java](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9uYW1lc3BhY2UvT3duZXJzaGlwQ2FjaGUuamF2YQ==) | `85.26% <ø> (+10.52%)` | :arrow_up: |
   | [...ache/pulsar/broker/namespace/NamespaceService.java](https://app.codecov.io/gh/apache/pulsar/pull/20189?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9uYW1lc3BhY2UvTmFtZXNwYWNlU2VydmljZS5qYXZh) | `69.75% <40.00%> (+25.99%)` | :arrow_up: |
   
   ... and [1426 files with indirect coverage changes](https://app.codecov.io/gh/apache/pulsar/pull/20189/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] poorbarcode commented on a diff in pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "poorbarcode (via GitHub)" <gi...@apache.org>.
poorbarcode commented on code in PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#discussion_r1192410763


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java:
##########
@@ -1137,6 +1137,10 @@ public CompletableFuture<Boolean> isServiceUnitOwnedAsync(ServiceUnitId suName)
                 new IllegalArgumentException("Invalid class of NamespaceBundle: " + suName.getClass().getName()));
     }
 
+    /**
+     * @Deprecated This method is only used by test. call "isServiceUnitActiveAsync" is better.
+     */
+    @Deprecated
     public boolean isServiceUnitActive(TopicName topicName) {
         try {

Review Comment:
   @lhotari 
   
   I think it is a good suggestion. Already edit this method to make the test that calls the method works better.
   
   Could you take a look again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] lhotari commented on a diff in pull request #20189: [fix][broker]Fix deadlock of metadata store

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on code in PR #20189:
URL: https://github.com/apache/pulsar/pull/20189#discussion_r1192249690


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java:
##########
@@ -1137,6 +1137,10 @@ public CompletableFuture<Boolean> isServiceUnitOwnedAsync(ServiceUnitId suName)
                 new IllegalArgumentException("Invalid class of NamespaceBundle: " + suName.getClass().getName()));
     }
 
+    /**
+     * @Deprecated This method is only used by test. call "isServiceUnitActiveAsync" is better.
+     */
+    @Deprecated
     public boolean isServiceUnitActive(TopicName topicName) {
         try {

Review Comment:
   would it make sense to use something like `isServiceUnitActiveAsync(topicName).get(conf.getMetadataStoreOperationTimeoutSeconds(), SECONDS);`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org