You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/04/11 03:38:58 UTC

[GitHub] [pulsar] dragonls opened a new issue, #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

dragonls opened a new issue, #15108:
URL: https://github.com/apache/pulsar/issues/15108

   **Describe the bug**
   We found `Topic is temporarily unavailable`. Producer can not produce data to this topic and consumer can not subscribe the topic.
   
   Restart the broker can make it back to normal.
   
   **To Reproduce**
   Hard to reproduce now.
   
   **Expected behavior**
   If something wrong while creating ledger, it should not make topic stuck.
   
   **Desktop (please complete the following information):**
    - OS: linux
    - Pulsar: 2.8.3 (with zk client version: 3.6.3)
    - zk server version: 3.5.7 (using the image of pulsar 2.7.2)
   
   **Additional context**
   At first, I think this might related to #13736. But I have upgrade broker to 2.8.3 and do not see the deadlock.
   
   I check the zookeeper, which seems nothing wrong.
   ![image](https://user-images.githubusercontent.com/2565118/162660123-52de5b21-d106-4da9-9814-90309a00a965.png)
   
   Also similar to #14333.
   
   The log:
   ```
   01:00:49.881 [BookKeeperClientWorker-OrderedExecutor-35-0] INFO  org.apache.bookkeeper.mledger.impl.OpAddEntry - [euler/train/persistent/mid_100360-partition-56] Closing ledger 28222932 for being full
   01:00:49.894 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100360-partition-56] Creating a new ledger after closed
   01:00:49.894 [BookKeeperClientWorker-OrderedExecutor-35-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100360-partition-56] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[101, 117, 108, 101, 114, 47, 116, 114, 97, 105, 110, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 109, 105, 100, 95, 49, 48, 48, 51, 54, 48, 45, 112, 97, 114, 116, 105, 116, 105, 111, 110, 45, 53, 54], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
   01:00:49.894 [BookKeeperClientWorker-OrderedExecutor-35-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [pulsar-euler-2-bookie-18.pulsar-euler-2-bookie.pulsar-euler-2.svc.cluster.local:3181, pulsar-euler-2-bookie-3.pulsar-euler-2-bookie.pulsar-euler-2.svc.cluster.local:3181, pulsar-euler-2-bookie-17.pulsar-euler-2-bookie.pulsar-euler-2.svc.cluster.local:3181] is not adhering to Placement Policy. quarantinedBookies: []
   01:00:56.571 [main-SendThread(pulsar-euler-2-zookeeper:2181)] WARN  org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 6667ms for session id 0x100015013ff0bf4
   01:00:56.571 [main-SendThread(pulsar-euler-2-zookeeper:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x100015013ff0bf4 for sever pulsar-euler-2-zookeeper/9.166.131.11:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
   org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 6667ms for session id 0x100015013ff0bf4
   	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1258) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   01:00:56.672 [main-EventThread] ERROR org.apache.bookkeeper.meta.AbstractZkLedgerManager - Could not create node for ledger 28224077
   org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ledgers/00/2822/L4077
   	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   	at org.apache.bookkeeper.meta.AbstractZkLedgerManager$1.processResult(AbstractZkLedgerManager.java:308) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   	at org.apache.bookkeeper.util.ZkUtils$1.processResult(ZkUtils.java:80) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$10$1.processResult(ZooKeeperClient.java:740) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:692) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   01:00:56.672 [main-EventThread] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100360-partition-56] Error creating ledger rc=-9 Error while using ZooKeeper
   
   // Disconnecting producer
   01:00:56.673 [main-EventThread] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-56}, client=/9.166.133.101:36154, producerName=pulsar-euler-2-945-1151, producerId=56}
   01:00:56.673 [main-EventThread] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-56}, client=/9.166.148.195:37798, producerName=pulsar-euler-2-945-1130, producerId=56}
   01:00:56.673 [main-EventThread] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-56}, client=/9.166.153.66:33934, producerName=pulsar-euler-2-945-1128, producerId=56}
   01:00:56.673 [main-EventThread] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-56}, client=/9.166.153.35:55318, producerName=pulsar-euler-2-945-1139, producerId=56}
   
   // repeated for "Error while using ZooKeeper" and "Closed connection"...
   01:00:56.674 [main-EventThread] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100360-partition-56] Failed to persist msg in store: Error while using ZooKeeper
   01:00:56.679 [main-EventThread] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100360-partition-56] Failed to persist msg in store: Error while using ZooKeeper
   01:00:56.679 [main-EventThread] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100360-partition-56] Failed to persist msg in store: Error while using ZooKeeper
   01:00:56.679 [pulsar-io-4-87] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /9.166.135.39:39346
   01:00:56.679 [pulsar-io-4-77] WARN  org.apache.pulsar.broker.service.ServerCnx - [/9.166.133.4:51598] Got exception io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
   
   01:00:56.692 [main-EventThread] WARN  org.apache.bookkeeper.meta.ZkLedgerIdGenerator - Exception during deleting znode for id generation : 
   org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ledgers/idgen/ID-0028224077
   	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   	at org.apache.bookkeeper.meta.ZkLedgerIdGenerator$1$1.processResult(ZkLedgerIdGenerator.java:113) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$12$1.processResult(ZooKeeperClient.java:800) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
   	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:736) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   01:00:56.692 [main-EventThread] INFO  org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is disconnected from zookeeper now, but it is OK unless we received EXPIRED event.
   01:00:56.798 [pulsar-io-4-26] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.137.68:60956
   01:00:56.798 [pulsar-io-4-27] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.137.135:39118
   01:00:56.799 [pulsar-io-4-28] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.134.165:38894
   01:00:56.800 [pulsar-io-4-29] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.150.163:42618
   01:00:56.800 [pulsar-io-4-30] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.134.102:40272
   01:00:56.800 [pulsar-io-4-31] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.134.198:47602
   01:00:56.800 [pulsar-io-4-32] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.135.197:37996
   01:00:56.800 [pulsar-io-4-33] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.135.39:40900
   01:00:56.801 [pulsar-io-4-34] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.136.5:53170
   01:00:56.801 [pulsar-io-4-35] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.134.102:40274
   01:00:56.801 [pulsar-io-4-36] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.135.69:46640
   01:00:56.801 [pulsar-io-4-37] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.148.98:42608
   01:00:56.802 [pulsar-io-4-38] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.135.39:40902
   01:00:56.802 [pulsar-io-4-39] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection from /9.166.136.5:53172
   01:00:56.803 [pulsar-io-4-27] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
   01:00:56.803 [pulsar-io-4-26] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
   01:00:56.803 [pulsar-io-4-27] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
   01:00:56.803 [pulsar-io-4-27] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
   01:00:56.803 [ForkJoinPool.commonPool-worker-165] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] persistent://euler/train/mid_100360-partition-2 configured with schema false
   01:00:56.803 [ForkJoinPool.commonPool-worker-91] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] persistent://euler/train/mid_100360-partition-112 configured with schema false
   01:00:56.803 [pulsar-io-4-26] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
   01:00:56.803 [ForkJoinPool.commonPool-worker-33] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] persistent://euler/train/mid_100360-partition-56 configured with schema false
   01:00:56.803 [ForkJoinPool.commonPool-worker-197] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] persistent://euler/train/mid_100360-partition-2 configured with schema false
   01:00:56.803 [pulsar-io-4-26] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
   01:00:56.803 [ForkJoinPool.commonPool-worker-33] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
   01:00:56.803 [ForkJoinPool.commonPool-worker-91] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-112}, client=/9.166.137.135:39118, producerName=pulsar-euler-2-904-498, producerId=112}
   01:00:56.803 [ForkJoinPool.commonPool-worker-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] persistent://euler/train/mid_100360-partition-112 configured with schema false
   01:00:56.803 [ForkJoinPool.commonPool-worker-165] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.137.135:39118, producerName=pulsar-euler-2-945-1141, producerId=2}
   01:00:56.803 [ForkJoinPool.commonPool-worker-19] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] persistent://euler/train/mid_100360-partition-56 configured with schema false
   01:00:56.803 [ForkJoinPool.commonPool-worker-197] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.137.68:60956, producerName=pulsar-euler-2-945-1103, producerId=2}
   01:00:56.803 [ForkJoinPool.commonPool-worker-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-112}, client=/9.166.137.68:60956, producerName=pulsar-euler-2-904-470, producerId=112}
   01:00:56.803 [ForkJoinPool.commonPool-worker-19] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
   01:00:56.803 [ForkJoinPool.commonPool-worker-33] WARN  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
   01:00:56.803 [ForkJoinPool.commonPool-worker-19] WARN  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.68:60956] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
   01:00:56.803 [pulsar-io-4-28] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
   01:00:56.803 [ForkJoinPool.commonPool-worker-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894] persistent://euler/train/mid_100360-partition-2 configured with schema false
   01:00:56.803 [pulsar-io-4-28] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
   01:00:56.804 [pulsar-io-4-28] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
   01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.134.165:38894, producerName=pulsar-euler-2-945-1109, producerId=2}
   01:00:56.804 [ForkJoinPool.commonPool-worker-197] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894] persistent://euler/train/mid_100360-partition-56 configured with schema false
   01:00:56.804 [ForkJoinPool.commonPool-worker-197] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
   01:00:56.804 [ForkJoinPool.commonPool-worker-197] WARN  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.165:38894] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
   01:00:56.804 [pulsar-io-4-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
   01:00:56.804 [pulsar-io-4-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
   01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] persistent://euler/train/mid_100360-partition-2 configured with schema false
   01:00:56.804 [pulsar-io-4-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
   01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.150.163:42618, producerName=pulsar-euler-2-945-1104, producerId=2}
   01:00:56.804 [ForkJoinPool.commonPool-worker-165] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] persistent://euler/train/mid_100360-partition-112 configured with schema false
   01:00:56.804 [ForkJoinPool.commonPool-worker-91] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] persistent://euler/train/mid_100360-partition-56 configured with schema false
   01:00:56.804 [ForkJoinPool.commonPool-worker-165] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-112}, client=/9.166.150.163:42618, producerName=pulsar-euler-2-904-472, producerId=112}
   01:00:56.804 [ForkJoinPool.commonPool-worker-91] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
   01:00:56.804 [ForkJoinPool.commonPool-worker-91] WARN  org.apache.pulsar.broker.service.ServerCnx - [/9.166.150.163:42618] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
   01:00:56.804 [pulsar-io-4-30] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272][persistent://euler/train/mid_100360-partition-2] Creating producer. producerId=2
   01:00:56.804 [pulsar-io-4-30] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272][persistent://euler/train/mid_100360-partition-112] Creating producer. producerId=112
   01:00:56.804 [ForkJoinPool.commonPool-worker-165] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272] persistent://euler/train/mid_100360-partition-2 configured with schema false
   01:00:56.804 [pulsar-io-4-30] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
   01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272] persistent://euler/train/mid_100360-partition-112 configured with schema false
   01:00:56.804 [ForkJoinPool.commonPool-worker-165] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-2}, client=/9.166.134.102:40272, producerName=pulsar-euler-2-945-1135, producerId=2}
   01:00:56.804 [ForkJoinPool.commonPool-worker-29] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.134.102:40272] Created new producer: Producer{topic=PersistentTopic{topic=persistent://euler/train/mid_100360-partition-112}, client=/9.166.134.102:40272, producerName=pulsar-euler-2-904-494, producerId=112}
   
   // zk reconnected
   01:00:58.357 [main-SendThread(pulsar-euler-2-zookeeper:2181)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server pulsar-euler-2-zookeeper/9.166.130.67:2181.
   01:00:58.357 [main-SendThread(pulsar-euler-2-zookeeper:2181)] INFO  org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
   01:00:58.358 [main-SendThread(pulsar-euler-2-zookeeper:2181)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /11.149.218.225:48946, server: pulsar-euler-2-zookeeper/9.166.130.67:2181
   01:00:58.359 [main-SendThread(pulsar-euler-2-zookeeper:2181)] INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server pulsar-euler-2-zookeeper/9.166.130.67:2181, session id = 0x100015013ff0bf4, negotiated timeout = 10000
   01:00:58.359 [main-EventThread] INFO  org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now.
   01:00:59.560 [pulsar-io-4-27] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118][persistent://euler/train/mid_100360-partition-56] Creating producer. producerId=56
   01:00:59.560 [ForkJoinPool.commonPool-worker-91] INFO  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] persistent://euler/train/mid_100360-partition-56 configured with schema false
   01:00:59.561 [ForkJoinPool.commonPool-worker-91] WARN  org.apache.pulsar.broker.service.AbstractTopic - [persistent://euler/train/mid_100360-partition-56] Attempting to add producer to a fenced topic
   01:00:59.561 [ForkJoinPool.commonPool-worker-91] WARN  org.apache.pulsar.broker.service.ServerCnx - [/9.166.137.135:39118] Failed to add producer to topic persistent://euler/train/mid_100360-partition-56: producerId=56, Topic is temporarily unavailable
   
   // the topic `mid_100360-partition-56` keep abnormal and other topics are still normal
   ```
   
   Thread dump:
   [pulsar-euler-2-broker-90.2.log](https://github.com/apache/pulsar/files/8460760/pulsar-euler-2-broker-90.2.log)
   [pulsar-euler-2-broker-90.3.log](https://github.com/apache/pulsar/files/8460761/pulsar-euler-2-broker-90.3.log)
   [pulsar-euler-2-broker-90.1.log](https://github.com/apache/pulsar/files/8460762/pulsar-euler-2-broker-90.1.log)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1096303000

   > @dragonls Have you changed the `zookeeperSessionExpiredPolicy` to reconnect? It's not a stable feature in 2.8.x and 2.9.x.
   
   No. `zookeeperSessionExpiredPolicy=shutdown`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1095850964

   @mattisonchao I did not see this phenomenon while using Pulsar 2.7.2.
   I can provide more broker logs and broker dump but not suitable uploaded here. Can I have your Wechat or email?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1158405456

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1096239437

   @dragonls Have you changed the `zookeeperSessionExpiredPolicy` to reconnect? It's not a stable feature in 2.8.x and 2.9.x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1103828450

   Seeing new phenomenon of logs:
   ```
   15:05:32.148 [bookkeeper-ml-scheduler-OrderedScheduler-0-0] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100481-partition-20] Failed to persist msg in store: java.util.concurrent.CompletionException: org.apache.pulsar.metadata.api.MetadataStoreException: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /managed-ledgers/euler/train/persistent/mid_100481-partition-20
   
   15:07:32.578 [ForkJoinPool.commonPool-worker-27] WARN  org.apache.pulsar.broker.service.ServerCnx - [/9.166.135.165:56492] Failed to add producer to topic persistent://euler/train/mid_100481-partition-20: producerId=916, Topic is temporarily unavailable
   15:07:32.578 [BookKeeperClientWorker-OrderedExecutor-16-0] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://euler/train/mid_100481-partition-20] Failed to persist msg in store: Bookie operation timeout
   15:07:32.442 [BookKeeperClientWorker-OrderedExecutor-16-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100481-partition-20] Ledger creation was initiated 120010 ms ago but it never completed and creation timeout task didn't kick in as well. Force to fail the create ledger operation.
   15:07:32.442 [BookKeeperClientWorker-OrderedExecutor-16-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [euler/train/persistent/mid_100481-partition-20] Error creating ledger rc=-23 Bookie operation timeout
   ```
   which makes the topic `temporarily unavailable`. Just the same as #14333.
   
   `topicFencingTimeoutSeconds>0` is useless this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1128809313

   After we upgrade the zk server version from 3.5.7 to 3.6.3, it seems problem solved. The zk connections are much more stable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1095760378

   HI @dragonls, I have some uncertain advice for you.
   
   This phenomenon is due to the failure of the ZK session leading to the failure to `open/close` the ledger. When `OpAddEntry `fails, the topic will be `fenced` until the topic is unloaded or `pendingWriteOps=0`.
   
   Before finding out why the ZK session was lost and why `pendingWriteOps !=0`, I think https://github.com/apache/pulsar/pull/8561 PR can help to temporarily solve the problem.
   
   Also, could you provide more broker logs, errors or brokers dump?
   Best,
   Mattison
   
   /cc @codelipenghui  @Technoboy-   This phenomenon loos like the TopicFenced problem we have seen before, and this user uses 2.8.3 which has already upgraded the Netty version. I'm not sure if it's related


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls commented on issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
dragonls commented on issue #15108:
URL: https://github.com/apache/pulsar/issues/15108#issuecomment-1101045264

   @mattisonchao I tried `topicFencingTimeoutSeconds>0`, temporarily solved.
   But also found that if a topic closed normally, `closeFencedTopicForcefully` still will be run, as the log `xxx Topic remained fenced for xxx...` printed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] dragonls closed issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"

Posted by GitBox <gi...@apache.org>.
dragonls closed issue #15108: [broker] "Failed to persist msg in store: Error while using ZooKeeper" causing "Topic is temporarily unavailable"
URL: https://github.com/apache/pulsar/issues/15108


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org