You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by zhangao <ga...@qq.com.INVALID> on 2021/12/31 03:41:45 UTC

[Bug] Other brokers failed to acquire bundle ownership after unloading a namespace bundle

Hi,&nbsp;
&nbsp; &nbsp; Currently, I found a problem about bundle ownership acquire, After I unloaded a namespace bundle, I found these error log on other brokers:


```
2021-12-29 14:37:37.641 [metadata-store-6-1] WARN  org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for topic persistent://public/data-channel/tet-partition-30 with error org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000 java.util.concurrent.CompletionException: org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000         at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]         at org.apache.pulsar.broker.namespace.NamespaceService.lambda$searchForCandidateBroker$15(NamespaceService.java:577) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]         at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]         at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.lambda$acquireLock$2(LockManagerImpl.java:111) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]         at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]         at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$acquire$4(ResourceLockImpl.java:134) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]         at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_102]         at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$get$7(ZKMetadataStore.java:139) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]         at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]         at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102] Caused by: org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000         ... 20 more Caused by: java.util.concurrent.CompletionException: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /namespace/public/data-channel/0xebf3b108_0xf0000000 is already locked         at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_102]         ... 17 more Caused by: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /namespace/public/data-channel/0xebf3b108_0xf0000000 is already locked         at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$doRevalidate$20(ResourceLockImpl.java:297) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]         at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) ~[?:1.8.0_102]         ... 7 more
```
The error logs said&nbsp; the resource is already&nbsp;locked, this means, the ZNODE of this namespace bundle created by previous broker was not removed yet. 

Normally, broker would firstly query the bundle owner before tried to acquire it.  the partial codes:


```
String path = ServiceUnitUtils.path(suName);
return lockManager.readLock(path);
```




If the ZNODE created by previous broker was not removed yet. Why broker got a result that no broker owned this bundle.

Re: [Bug] Other brokers failed to acquire bundle ownership after unloading a namespace bundle

Posted by Haiting Jiang <ji...@apache.org>.
Hi,

Please provide more information about this issue, like pulsar version, steps to steady reproduce this, the influence of this error.

Thanks,
Haiting Jiang

On 2021/12/31 03:41:45 zhangao wrote:
> Hi,&nbsp;
> &nbsp; &nbsp; Currently, I found a problem about bundle ownership acquire, After I unloaded a namespace bundle, I found these error log on other brokers:
> 
> 
> ```
> 2021-12-29 14:37:37.641 [metadata-store-6-1] WARN  org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup null for topic persistent://public/data-channel/tet-partition-30 with error org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000 java.util.concurrent.CompletionException: org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000         at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]         at org.apache.pulsar.broker.namespace.NamespaceService.lambda$searchForCandidateBroker$15(NamespaceService.java:577) ~[org.apache.pulsar-pulsar-broker-2.9.1.jar:2.9.1]         at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]         at org.apache.pulsar.metadata.coordination.impl.LockManagerImpl.lambda$acquireLock$2(LockManagerImpl.java:111) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]         at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_102]         at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$acquire$4(ResourceLockImpl.java:134) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]         at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) ~[?:1.8.0_102]         at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$get$7(ZKMetadataStore.java:139) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]         at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]         at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102] Caused by: org.apache.pulsar.broker.PulsarServerException: Failed to acquire ownership for namespace bundle public/data-channel/0xebf3b108_0xf0000000         ... 20 more Caused by: java.util.concurrent.CompletionException: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /namespace/public/data-channel/0xebf3b108_0xf0000000 is already locked         at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_102]         ... 17 more Caused by: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /namespace/public/data-channel/0xebf3b108_0xf0000000 is already locked         at org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl.lambda$doRevalidate$20(ResourceLockImpl.java:297) ~[org.apache.pulsar-pulsar-metadata-2.9.1.jar:2.9.1]         at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952) ~[?:1.8.0_102]         at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926) ~[?:1.8.0_102]         ... 7 more
> ```
> The error logs said&nbsp; the resource is already&nbsp;locked, this means, the ZNODE of this namespace bundle created by previous broker was not removed yet. 
> 
> Normally, broker would firstly query the bundle owner before tried to acquire it.  the partial codes:
> 
> 
> ```
> String path = ServiceUnitUtils.path(suName);
> return lockManager.readLock(path);
> ```
> 
> 
> 
> 
> If the ZNODE created by previous broker was not removed yet. Why broker got a result that no broker owned this bundle.