You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/04/11 03:38:58 UTC
[GitHub] [pulsar] dragonls opened a new issue, #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
dragonls opened a new issue, #15108:
URL: https://github.com/apache/pulsar/issues/15108
**Describe the bug**
We found `Topic is temporarily unavailable`. Producer can not produce data to this topic and consumer can not subscribe the topic.
Restart the broker can make it back to normal.
**To Reproduce**
Hard to reproduce now.
**Expected behavior**
If something wrong while creating ledger, it should not make topic stuck.
**Desktop (please complete the following information):**
- OS: linux
- Pulsar: 2.8.3 (with zk client version: 3.6.3)
- zk server version: 3.5.7 (using the image of pulsar 2.7.2)
**Additional context**
At first, I think this might related to #13736. But I have upgrade broker to 2.8.3 and do not see the deadlock.
I check the zookeeper, which seems nothing wrong.
![image](https://user-images.githubusercontent.com/2565118/162660123-52de5b21-d106-4da9-9814-90309a00a965.png)
Also similar to #14333.
The log:
```
01:00:49.881 [BookKeeperClientWorker-OrderedExecutor-35-0] INFO org.apache.bookkeeper.mledger.impl.OpAddEntry - [euler/train/persistent/mid_100360-partition-56] Closing ledger 28222932 for being full
01:00:49.894 [main-EventThread] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100360-partition-56] Creating a new ledger after closed
01:00:49.894 [BookKeeperClientWorker-OrderedExecutor-35-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100360-partition-56] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[101, 117, 108, 101, 114, 47, 116, 114, 97, 105, 110, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 109, 105, 100, 95, 49, 48, 48, 51, 54, 48, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 53, 54], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
01:00:49.894 [BookKeeperClientWorker-OrderedExecutor-35-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [pulsar-euler-2-bookie-18.pulsar-euler-2-bookie.pulsar-euler-2.svc.cluster.local:3181, pulsar-euler-2-bookie-3.pulsar-euler-2-bookie.pulsar-euler-2.svc.cluster.local:3181, pulsar-euler-2-bookie-17.pulsar-euler-2-bookie.pulsar-euler-2.svc.cluster.local:3181] is not adhering to Placement Policy. quarantinedBookies: []
01:00:56.571 [main-SendThread(pulsar-euler-2-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 6667ms for session id 0x100015013ff0bf4
01:00:56.571 [main-SendThread(pulsar-euler-2-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x100015013ff0bf4 for sever pulsar-euler-2-zookeeper/9.166.131.11:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 6667ms for session id 0x100015013ff0bf4
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1258) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
01:00:56.672 [main-EventThread] ERROR org.apache.bookkeeper.meta.AbstractZkLedgerManager - Could not create node for ledger 28224077
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ledgers/00/2822/L4077
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.bookkeeper.meta.AbstractZkLedgerManager$1.processResult(AbstractZkLedgerManager.java:308) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.util.ZkUtils$1.processResult(ZkUtils.java:80) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.zookeeper.ZooKeeperClient$10$1.processResult(ZooKeeperClient.java:740) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:692) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
01:00:56.672 [main-EventThread] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100360-partition-56] Error creating ledger rc=-9 Error while using ZooKeeper
// Disconnecting producer
01:00:56.673 [main-EventThread] INFO org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-56}, client=/9.166.133.101:36154, producerName=pulsar-euler-2-945-1151, producerId=56}
01:00:56.673 [main-EventThread] INFO org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-56}, client=/9.166.148.195:37798, producerName=pulsar-euler-2-945-1130, producerId=56}
01:00:56.673 [main-EventThread] INFO org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-56}, client=/9.166.153.66:33934, producerName=pulsar-euler-2-945-1128, producerId=56}
01:00:56.673 [main-EventThread] INFO org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-56}, client=/9.166.153.35:55318, producerName=pulsar-euler-2-945-1139, producerId=56}
// repeated for "Error while using ZooKeeper" and "Closed connection"...
01:00:56.674 [main-EventThread] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100360-partition-56] Failed to persist msg in store: Error while using ZooKeeper
01:00:56.679 [main-EventThread] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100360-partition-56] Failed to persist msg in store: Error while using ZooKeeper
01:00:56.679 [main-EventThread] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100360-partition-56] Failed to persist msg in store: Error while using ZooKeeper
01:00:56.679 [pulsar-io-4-87] INFO org.apache.pulsar.broker.service.ServerCnx - Closed connection from /9.166.135.39:39346
01:00:56.679 [pulsar-io-4-77] WARN org.apache.pulsar.broker.service.ServerCnx - [/9.166.133.4:51598] Got exception io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
01:00:56.692 [main-EventThread] WARN org.apache.bookkeeper.meta.ZkLedgerIdGenerator - Exception during deleting znode for id generation :
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ledgers/idgen/ID-0028224077
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.bookkeeper.meta.ZkLedgerIdGenerator$1$1.processResult(ZkLedgerIdGenerator.java:113) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.zookeeper.ZooKeeperClient$12$1.processResult(ZooKeeperClient.java:800) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:736) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
01:00:56.692 [main-EventThread] INFO org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is disconnected from zookeeper now, but it is OK unless we received EXPIRED event.
01:00:56.798 [pulsar-io-4-26] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.137.68:60956
01:00:56.798 [pulsar-io-4-27] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.137.135:39118
01:00:56.799 [pulsar-io-4-28] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.134.165:38894
01:00:56.800 [pulsar-io-4-29] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.150.163:42618
01:00:56.800 [pulsar-io-4-30] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.134.102:40272
01:00:56.800 [pulsar-io-4-31] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.134.198:47602
01:00:56.800 [pulsar-io-4-32] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.135.197:37996
01:00:56.800 [pulsar-io-4-33] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.135.39:40900
01:00:56.801 [pulsar-io-4-34] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.136.5:53170
01:00:56.801 [pulsar-io-4-35] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.134.102:40274
01:00:56.801 [pulsar-io-4-36] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.135.69:46640
01:00:56.801 [pulsar-io-4-37] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.148.98:42608
01:00:56.802 [pulsar-io-4-38] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.135.39:40902
01:00:56.802 [pulsar-io-4-39] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.136.5:53172
01:00:56.803 [pulsar-io-4-27] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
01:00:56.803 [pulsar-io-4-26] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
01:00:56.803 [pulsar-io-4-27] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
01:00:56.803 [pulsar-io-4-27] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
01:00:56.803 [ForkJoinPool.commonPool-worker-165] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] persistent://euler/train/mid_100360-partition-2 configured with schema false
01:00:56.803 [ForkJoinPool.commonPool-worker-91] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] persistent://euler/train/mid_100360-partition-112 configured with schema false
01:00:56.803 [pulsar-io-4-26] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
01:00:56.803 [ForkJoinPool.commonPool-worker-33] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] persistent://euler/train/mid_100360-partition-56 configured with schema false
01:00:56.803 [ForkJoinPool.commonPool-worker-197] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] persistent://euler/train/mid_100360-partition-2 configured with schema false
01:00:56.803 [pulsar-io-4-26] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
01:00:56.803 [ForkJoinPool.commonPool-worker-33] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
01:00:56.803 [ForkJoinPool.commonPool-worker-91] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-112}, client=/9.166.137.135:39118, producerName=pulsar-euler-2-904-498, producerId=112}
01:00:56.803 [ForkJoinPool.commonPool-worker-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] persistent://euler/train/mid_100360-partition-112 configured with schema false
01:00:56.803 [ForkJoinPool.commonPool-worker-165] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.137.135:39118, producerName=pulsar-euler-2-945-1141, producerId=2}
01:00:56.803 [ForkJoinPool.commonPool-worker-19] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] persistent://euler/train/mid_100360-partition-56 configured with schema false
01:00:56.803 [ForkJoinPool.commonPool-worker-197] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.137.68:60956, producerName=pulsar-euler-2-945-1103, producerId=2}
01:00:56.803 [ForkJoinPool.commonPool-worker-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-112}, client=/9.166.137.68:60956, producerName=pulsar-euler-2-904-470, producerId=112}
01:00:56.803 [ForkJoinPool.commonPool-worker-19] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
01:00:56.803 [ForkJoinPool.commonPool-worker-33] WARN org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
01:00:56.803 [ForkJoinPool.commonPool-worker-19] WARN org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
01:00:56.803 [pulsar-io-4-28] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
01:00:56.803 [ForkJoinPool.commonPool-worker-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894] persistent://euler/train/mid_100360-partition-2 configured with schema false
01:00:56.803 [pulsar-io-4-28] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
01:00:56.804 [pulsar-io-4-28] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.134.165:38894, producerName=pulsar-euler-2-945-1109, producerId=2}
01:00:56.804 [ForkJoinPool.commonPool-worker-197] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894] persistent://euler/train/mid_100360-partition-56 configured with schema false
01:00:56.804 [ForkJoinPool.commonPool-worker-197] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
01:00:56.804 [ForkJoinPool.commonPool-worker-197] WARN org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
01:00:56.804 [pulsar-io-4-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
01:00:56.804 [pulsar-io-4-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] persistent://euler/train/mid_100360-partition-2 configured with schema false
01:00:56.804 [pulsar-io-4-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.150.163:42618, producerName=pulsar-euler-2-945-1104, producerId=2}
01:00:56.804 [ForkJoinPool.commonPool-worker-165] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] persistent://euler/train/mid_100360-partition-112 configured with schema false
01:00:56.804 [ForkJoinPool.commonPool-worker-91] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] persistent://euler/train/mid_100360-partition-56 configured with schema false
01:00:56.804 [ForkJoinPool.commonPool-worker-165] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-112}, client=/9.166.150.163:42618, producerName=pulsar-euler-2-904-472, producerId=112}
01:00:56.804 [ForkJoinPool.commonPool-worker-91] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
01:00:56.804 [ForkJoinPool.commonPool-worker-91] WARN org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
01:00:56.804 [pulsar-io-4-30] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
01:00:56.804 [pulsar-io-4-30] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
01:00:56.804 [ForkJoinPool.commonPool-worker-165] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272] persistent://euler/train/mid_100360-partition-2 configured with schema false
01:00:56.804 [pulsar-io-4-30] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272] persistent://euler/train/mid_100360-partition-112 configured with schema false
01:00:56.804 [ForkJoinPool.commonPool-worker-165] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.134.102:40272, producerName=pulsar-euler-2-945-1135, producerId=2}
01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-112}, client=/9.166.134.102:40272, producerName=pulsar-euler-2-904-494, producerId=112}
// zk reconnected
01:00:58.357 [main-SendThread(pulsar-euler-2-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server pulsar-euler-2-zookeeper/9.166.130.67:2181.
01:00:58.357 [main-SendThread(pulsar-euler-2-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
01:00:58.358 [main-SendThread(pulsar-euler-2-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /11.149.218.225:48946, server: pulsar-euler-2-zookeeper/9.166.130.67:2181
01:00:58.359 [main-SendThread(pulsar-euler-2-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server pulsar-euler-2-zookeeper/9.166.130.67:2181, session id = 0x100015013ff0bf4, negotiated timeout = 10000
01:00:58.359 [main-EventThread] INFO org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now.
01:00:59.560 [pulsar-io-4-27] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
01:00:59.560 [ForkJoinPool.commonPool-worker-91] INFO org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] persistent://euler/train/mid_100360-partition-56 configured with schema false
01:00:59.561 [ForkJoinPool.commonPool-worker-91] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
01:00:59.561 [ForkJoinPool.commonPool-worker-91] WARN org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
// the topic `mid_100360-partition-56` keep abnormal and other topics are still normal
```
Thread dump:
[pulsar-euler-2-broker-90.2.log](https://github.com/apache/pulsar/files/8460760/pulsar-euler-2-broker-90.2.log)
[pulsar-euler-2-broker-90.3.log](https://github.com/apache/pulsar/files/8460761/pulsar-euler-2-broker-90.3.log)
[pulsar-euler-2-broker-90.1.log](https://github.com/apache/pulsar/files/8460762/pulsar-euler-2-broker-90.1.log)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1096303000
> @dragonls Have you changed the `zookeeperSessionExpiredPolicy` to reconnect? It's not a stable feature in 2.8.x and 2.9.x.
No. `zookeeperSessionExpiredPolicy=shutdown`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1095850964
@mattisonchao I did not see this phenomenon while using Pulsar 2.7.2.
I can provide more broker logs and broker dump but not suitable uploaded here. Can I have your Wechat or email?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] github-actions[bot] commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1158405456
The issue had no activity for 30 days, mark with Stale label.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] codelipenghui commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1096239437
@dragonls Have you changed the `zookeeperSessionExpiredPolicy` to reconnect? It's not a stable feature in 2.8.x and 2.9.x.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1103828450
Seeing new phenomenon of logs:
```
15:05:32.148 [bookkeeper-ml-scheduler-OrderedScheduler-0-0] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100481-partition-20] Failed to persist msg in store: java.util.concurrent.CompletionException: org.apache.pulsar.metadata.api.MetadataStoreException: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /managed-ledgers/euler/train/persistent/mid_100481-partition-20
15:07:32.578 [ForkJoinPool.commonPool-worker-27] WARN org.apache.pulsar.broker.service.ServerCnx - [/9.166.135.165:56492] Failed to add producer to topic persistent://euler/train/mid_100481-partition-20: producerId=916, Topic is temporarily unavailable
15:07:32.578 [BookKeeperClientWorker-OrderedExecutor-16-0] WARN org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100481-partition-20] Failed to persist msg in store: Bookie operation timeout
15:07:32.442 [BookKeeperClientWorker-OrderedExecutor-16-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100481-partition-20] Ledger creation was initiated 120010 ms ago but it never completed and creation timeout task didn't kick in as well. Force to fail the create ledger operation.
15:07:32.442 [BookKeeperClientWorker-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100481-partition-20] Error creating ledger rc=-23 Bookie operation timeout
```
which makes the topic `temporarily unavailable`. Just the same as #14333.
`topicFencingTimeoutSeconds>0` is useless this time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1128809313
After we upgrade the zk server version from 3.5.7 to 3.6.3, it seems problem solved. The zk connections are much more stable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] mattisonchao commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
mattisonchao commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1095760378
HI @dragonls, I have some uncertain advice for you.
This phenomenon is due to the failure of the ZK session leading to the failure to `open/close` the ledger. When `OpAddEntry `fails, the topic will be `fenced` until the topic is unloaded or `pendingWriteOps=0`.
Before finding out why the ZK session was lost and why `pendingWriteOps !=0`, I think https://github.com/apache/pulsar/pull/8561 PR can help to temporarily solve the problem.
Also, could you provide more broker logs, errors or brokers dump?
Best,
Mattison
/cc @codelipenghui @Technoboy- This phenomenon loos like the TopicFenced problem we have seen before, and this user uses 2.8.3 which has already upgraded the Netty version. I'm not sure if it's related
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1101045264
@mattisonchao I tried `topicFencingTimeoutSeconds>0`, temporarily solved.
But also found that if a topic closed normally, `closeFencedTopicForcefully` still will be run, as the log `xxx Topic remained fenced for xxx...` printed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] dragonls closed issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
Posted by GitBox <gi...@apache.org>.
dragonls closed issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
URL: https://github.com/apache/pulsar/issues/15108
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org