You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/07/22 05:57:38 UTC

[GitHub] [pulsar] gaozhangmin opened a new issue, #13584: Failed to acquire bundle ownership

gaozhangmin opened a new issue, #13584:
URL: https://github.com/apache/pulsar/issues/13584

   Pulsar version:2.9.1
   ```
   2021-12-29 14:37:37.641 [metadata-store-6-1] WARN  org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for topic persistent://public/data-channel/tet-partition-30 with error org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000
   java.util.concurrent.CompletionException: org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000
           at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]
           at org.apache.pulsar.broker.namespace.NamespaceService.lambda$searchForCandidateBroker$15(NamespaceService.java:577) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]
           at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]
           at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.lambda$acquireLock$2(LockManagerImpl.java:111) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]
           at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]
           at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$acquire$4(ResourceLockImpl.java:134) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]
           at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_102]
           at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$get$7(ZKMetadataStore.java:139) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]
           at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]
           at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]
   Caused by: org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000
           ... 20 more
   Caused by: java.util.concurrent.CompletionException: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /namespace/public/data-channel/0xebf3b108_0xf0000000 is already locked
           at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_102]
           ... 17 more
   Caused by: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /namespace/public/data-channel/0xebf3b108_0xf0000000 is already locked
           at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$doRevalidate$20(ResourceLockImpl.java:297) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]
           at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952) ~[?:1.8.0_102]
           at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) ~[?:1.8.0_102]
           ... 7 more
   ```
   The error logs said  the resource is already locked, this means, the ZNODE of this namespace bundle created by previous broker was not removed yet.
   Normally, broker would firstly query the bundle owner before tried to acquire it.
   
   ```
   public CompletableFuture<Optional<NamespaceEphemeralData>> getOwnerAsync(NamespaceBundle suName) {
           CompletableFuture<OwnedBundle> ownedBundleFuture = ownedBundlesCache.getIfPresent(suName);
           if (ownedBundleFuture != null) {
               // Either we're the owners or we're trying to become the owner.
               return ownedBundleFuture.thenApply(serviceUnit -> {
                   // We are the owner of the service unit
                   return Optional.of(serviceUnit.isActive() ? selfOwnerInfo : selfOwnerInfoDisabled);
               });
           }
   
           // If we're not the owner, we need to check if anybody else is
           String path = ServiceUnitUtils.path(suName);
           return lockManager.readLock(path);
       }
   ```
   
   If the ZNODE created by previous broker was not removed yet. Why `lockManager.readLock(path)` returned none owner of this bundle. 
   
   #### Reproduce step.
   
   It's hard to reproduce, since i just unloaded  some bundle during the consumer and producer exists,.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] sakurafly123 commented on issue #13584: Failed to acquire bundle ownership

Posted by GitBox <gi...@apache.org>.
sakurafly123 commented on issue #13584:
URL: https://github.com/apache/pulsar/issues/13584#issuecomment-1196187784

   > > This problem occurs in 2.10
   > 
   > i have the problem when i have many topics about 1million。the connections between broker and zookeeper will be disconnected and connected frequently. then this problem will be occurs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] lhotari commented on issue #13584: Failed to acquire bundle ownership

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #13584:
URL: https://github.com/apache/pulsar/issues/13584#issuecomment-1192211965

   @gaozhangmin please explain the resolution since you closed the issue without describing the resolution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] michaeljmarshall commented on issue #13584: Failed to acquire bundle ownership

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #13584:
URL: https://github.com/apache/pulsar/issues/13584#issuecomment-1233481248

   I think this is probably happening because two brokers make "decentralized" load decisions for the same bundle but to different brokers. Then, when each broker goes to load the bundle, one will win and one will lose.
   
   My follow up question is whether the client should retry this type of failure or if it should propagate the failure back to the application code? It seems like a pretty retriable failure given that we know the topic was owned by another broker at a recent point in time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] sakurafly123 commented on issue #13584: Failed to acquire bundle ownership

Posted by GitBox <gi...@apache.org>.
sakurafly123 commented on issue #13584:
URL: https://github.com/apache/pulsar/issues/13584#issuecomment-1196187329

   > This problem occurs in 2.10
   
   i have the problem  when i have many topics about 1million。the connections between broker and zookeeper will be disconnected and connected frequently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] lhotari commented on issue #13584: Failed to acquire bundle ownership

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #13584:
URL: https://github.com/apache/pulsar/issues/13584#issuecomment-1192212210

   This problem occurs in 2.10 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org