You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/03/10 00:19:52 UTC

[GitHub] [pulsar] bharanic-dev opened a new pull request #14634: [ Issue 14633] [pulsar-broker] Fix metadata store deadlock.

bharanic-dev opened a new pull request #14634:
URL: https://github.com/apache/pulsar/pull/14634


   
   Fixes #14633 
   
   *(or if this PR is one task of a github issue, please add `Master Issue: #<xyz>` to link to the master issue.)*
   
   ### Motivation
   
   https://github.com/apache/pulsar/blob/2b3e8aeb5a1c259e0325e5a91dc5d7e20c6ee569/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1757
   
   makes a blocking call to metadatastore operation (stack trace below) while holding a lock (due to the synchronized keyword). But the callback that completes the future can't execute (metadata-store executor is a single threaded executor) because the callback is blocked waiting for the lock held by this thread.
   
   "pulsar-backlog-quota-checker-30-1" #88 prio=5 os_prio=0 cpu=662.90ms elapsed=81026.81s tid=0x00007f17031ce000 nid=0xa4 waiting on condition  [0x00007f15a631e000]
      java.lang.Thread.State: TIMED_WAITING (parking)
           at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
           - parking to wait for  <0x00000007c1f55408> (a java.util.concurrent.CompletableFuture$Signaller)
           at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.13/LockSupport.java:234)
           at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.13/CompletableFuture.java:1798)
           at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.13/ForkJoinPool.java:3128)
           at java.util.concurrent.CompletableFuture.timedGet(java.base@11.0.13/CompletableFuture.java:1868)
           at java.util.concurrent.CompletableFuture.get(java.base@11.0.13/CompletableFuture.java:2021)
           at org.apache.pulsar.broker.resources.BaseResources.get(BaseResources.java:86)
           at org.apache.pulsar.broker.resources.NamespaceResources.getPolicies(NamespaceResources.java:105)
           at org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:70)
           at org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:82)
           at org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuotaLimitInSize(BacklogQuotaManager.java:101)
           at org.apache.pulsar.broker.service.persistent.PersistentTopic.isSizeBacklogExceeded(PersistentTopic.java:2502)
           at org.apache.pulsar.broker.service.BrokerService.lambda$monitorBacklogQuota$69(BrokerService.java:1611)
           at org.apache.pulsar.broker.service.BrokerService$$Lambda$713/0x00000008406a9840.accept(Unknown Source)
           at org.apache.pulsar.broker.service.BrokerService$$Lambda$709/0x00000008406a8840.accept(Unknown Source)
           at java.util.Optional.ifPresent(java.base@11.0.13/Optional.java:183)
           at org.apache.pulsar.broker.service.BrokerService.lambda$forEachTopic$68(BrokerService.java:1599)
           at org.apache.pulsar.broker.service.BrokerService$$Lambda$708/0x00000008406a8440.accept(Unknown Source)
           at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387)
           at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159)
           at org.apache.pulsar.broker.service.BrokerService.forEachTopic(BrokerService.java:1597)
           at org.apache.pulsar.broker.service.BrokerService.monitorBacklogQuota(BrokerService.java:1608)
           - locked <0x00000003018b1c30> (a org.apache.pulsar.broker.service.BrokerService)
           at org.apache.pulsar.broker.service.BrokerService$$Lambda$320/0x00000008403f4840.run(Unknown Source)
           at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32)
           at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
           at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.13/Executors.java:515)
           at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.13/FutureTask.java:305)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.13/ScheduledThreadPoolExecutor.java:305)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/ThreadPoolExecutor.java:1128)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/ThreadPoolExecutor.java:628)
           at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
           at java.lang.Thread.run(java.base@11.0.13/Thread.java:829)
   
   https://github.com/apache/pulsar/blob/0a91196dcc4d31ae647867ed319b8c1af0cb93c6/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L78
   
   ### Modifications
   
   The synchronized keyword was added as part of https://github.com/apache/pulsar/pull/14367. This causes the deadlock. The synchronized is not really required as the topic datastructure is a concurrentHashMap.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   The fix was also verified in production. After deploying the broker deadlocks and restarts went away.
   
   ### Documentation
   - [x] `no-need-doc` 
   
   Internal fix. Not user visible.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] bharanic-dev commented on pull request #14634: [ Issue 14633] [pulsar-broker] Fix metadata store deadlock.

Posted by GitBox <gi...@apache.org>.
bharanic-dev commented on pull request #14634:
URL: https://github.com/apache/pulsar/pull/14634#issuecomment-1064258096


   > 
   
   Ah, that makes sense. I will make the change suggested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Jason918 commented on pull request #14634: [ Issue 14633] [pulsar-broker] Fix metadata store deadlock.

Posted by GitBox <gi...@apache.org>.
Jason918 commented on pull request #14634:
URL: https://github.com/apache/pulsar/pull/14634#issuecomment-1063584165


   > The synchronized keyword was added as part of #14367.
   
   It's added in #8045


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui merged pull request #14634: [ Issue 14633] [pulsar-broker] Fix metadata store deadlock.

Posted by GitBox <gi...@apache.org>.
codelipenghui merged pull request #14634:
URL: https://github.com/apache/pulsar/pull/14634


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] bharanic-dev commented on pull request #14634: [ Issue 14633] [pulsar-broker] Fix metadata store deadlock.

Posted by GitBox <gi...@apache.org>.
bharanic-dev commented on pull request #14634:
URL: https://github.com/apache/pulsar/pull/14634#issuecomment-1067001756


   @Jason918 PTAL when you get a chance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] bharanic-dev commented on pull request #14634: [ Issue 14633] [pulsar-broker] Fix metadata store deadlock.

Posted by GitBox <gi...@apache.org>.
bharanic-dev commented on pull request #14634:
URL: https://github.com/apache/pulsar/pull/14634#issuecomment-1064255069


   > > The synchronized keyword was added as part of #14367.
   > 
   > It's added in #8045
   
   Sorry, that was a typo. Thanks for catching it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org