You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/13 09:57:47 UTC

[GitHub] [pulsar] massakam opened a new pull request #8561: [broker] Close topics that remain fenced forcefully

massakam opened a new pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561


   ### Motivation
   
   The other day, we faced a problem where a topic remained fenced and unavailable. This topic remained unavailable until it was unloaded. The following is the broker log at that time.
   ```
   11:37:55.905 [bookkeeper-ml-workers-OrderedExecutor-77-0] INFO  o.a.b.mledger.impl.OpAddEntry        - [tenant/ns/persistent/topic] Closing ledger 40891546 for being full
   11:37:56.208 [pulsar-ordered-OrderedExecutor-0-0-EventThread] ERROR o.a.b.client.MetadataUpdateLoop      - UpdateLoop(ledgerId=40891546,loopId=6ce63876) Error writing metadata to store
   11:37:56.209 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.b.mledger.impl.OpAddEntry        - Error when closing ledger 40891546. Status=Error while using ZooKeeper
   11:37:56.359 [pulsar-ordered-OrderedExecutor-0-0-EventThread] ERROR o.a.b.mledger.impl.ManagedLedgerImpl - [tenant/ns/persistent/topic] Error creating ledger rc=-9 Error while using ZooKeeper
   11:37:56.359 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  o.a.pulsar.broker.service.Producer   - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://tenant/ns/topic}, client=/xxx.xxx.xxx.xxx:40646, producerName=pulsar.repl.jp-west, producerId=668}
   11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
   11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  o.a.pulsar.broker.service.Producer   - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://tenant/ns/topic}, client=/xxx.xxx.xxx.xxx:40646, producerName=pulsar.repl.jp-west, producerId=668}
   11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
   11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  o.a.pulsar.broker.service.Producer   - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://tenant/ns/topic}, client=/xxx.xxx.xxx.xxx:40646, producerName=pulsar.repl.jp-west, producerId=668}
   11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
   11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
   11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
   11:37:56.360 [pulsar-ordered-OrderedExecutor-0-0-EventThread] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Failed to persist msg in store: Error while using ZooKeeper
   11:37:57.495 [ForkJoinPool.commonPool-worker-51] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40256][persistent://tenant/ns/topic] Creating producer. producerId=668
   11:37:58.291 [bookkeeper-ml-workers-OrderedExecutor-77-0] INFO  o.a.b.mledger.impl.ManagedLedgerImpl - [tenant/ns/persistent/topic] End TrimConsumedLedgers. ledgers=2 totalSize=162868668
   11:37:58.291 [bookkeeper-ml-workers-OrderedExecutor-77-0] INFO  o.a.b.mledger.impl.ManagedLedgerImpl - [tenant/ns/persistent/topic] Removing ledger 40880508 - size: 82183409
   11:37:58.292 [ForkJoinPool.commonPool-worker-20] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40256]-668 persistent://tenant/ns/topic configured with schema false
   11:37:58.292 [ForkJoinPool.commonPool-worker-20] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Attempting to add producer to a fenced topic
   11:37:58.292 [ForkJoinPool.commonPool-worker-20] ERROR o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40256] Failed to add producer to topic persistent://tenant/ns/topic: Topic is temporarily unavailable
   11:37:58.728 [ForkJoinPool.commonPool-worker-75] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40330][persistent://tenant/ns/topic] Creating producer. producerId=668
   11:37:58.729 [ForkJoinPool.commonPool-worker-75] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40330]-668 persistent://tenant/ns/topic configured with schema false
   11:37:58.729 [ForkJoinPool.commonPool-worker-75] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Attempting to add producer to a fenced topic
   11:37:58.729 [ForkJoinPool.commonPool-worker-75] ERROR o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40330] Failed to add producer to topic persistent://tenant/ns/topic: Topic is temporarily unavailable
   11:37:59.489 [ForkJoinPool.commonPool-worker-106] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40260][persistent://tenant/ns/topic] Creating producer. producerId=668
   11:37:59.489 [ForkJoinPool.commonPool-worker-106] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40260]-668 persistent://tenant/ns/topic configured with schema false
   11:37:59.489 [ForkJoinPool.commonPool-worker-106] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Attempting to add producer to a fenced topic
   11:37:59.489 [ForkJoinPool.commonPool-worker-106] ERROR o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40260] Failed to add producer to topic persistent://tenant/ns/topic: Topic is temporarily unavailable
   11:38:01.062 [ForkJoinPool.commonPool-worker-51] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40248][persistent://tenant/ns/topic] Creating producer. producerId=668
   11:38:01.062 [ForkJoinPool.commonPool-worker-51] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40248]-668 persistent://tenant/ns/topic configured with schema false
   11:38:01.063 [ForkJoinPool.commonPool-worker-51] WARN  o.a.p.b.s.persistent.PersistentTopic - [persistent://tenant/ns/topic] Attempting to add producer to a fenced topic
   11:38:01.063 [ForkJoinPool.commonPool-worker-51] ERROR o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40248] Failed to add producer to topic persistent://tenant/ns/topic: Topic is temporarily unavailable
   11:38:04.103 [ForkJoinPool.commonPool-worker-90] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40338][persistent://tenant/ns/topic] Creating producer. producerId=668
   11:38:04.104 [ForkJoinPool.commonPool-worker-102] INFO  o.a.pulsar.broker.service.ServerCnx  - [/xxx.xxx.xxx.xxx:40338]-668 persistent://tenant/ns/topic configured with schema false
   ```
   
   We were maintaining the ZooKeeper servers, so I think this phenomenon was caused by the shutdown of some ZK servers. However, the causal relationship has not been clarified.
   
   ### Modifications
   
   As a workaround, close the topic if it remains fenced for a period of time. Reconnecting from the clients will instantiate a new `PersistentTopic` topic and the topic will back to normal.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8561: [broker] Close topics that remain fenced forcefully

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561#issuecomment-730729825


   /pulsarbot cherry-pick to branch-2.6


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie merged pull request #8561: [broker] Close topics that remain fenced forcefully

Posted by GitBox <gi...@apache.org>.
sijie merged pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] massakam commented on a change in pull request #8561: [broker] Close topics that remain fenced forcefully

Posted by GitBox <gi...@apache.org>.
massakam commented on a change in pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561#discussion_r524846319



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
##########
@@ -2384,6 +2387,40 @@ public boolean isSystemTopic() {
         return false;
     }
 
+    private synchronized void fence() {
+        isFenced = true;
+        ScheduledFuture<?> monitoringTask = this.fencedTopicMonitoringTask;
+        if (monitoringTask == null || monitoringTask.isDone()) {
+            final int timeout = brokerService.pulsar().getConfiguration().getTopicFencingTimeoutSeconds();
+            if (timeout > 0) {
+                this.fencedTopicMonitoringTask = brokerService.executor().schedule(this::closeFencedTopicForcefully,
+                        timeout, TimeUnit.SECONDS);
+            }
+        }
+    }
+
+    private synchronized void unfence() {
+        isFenced = false;
+        ScheduledFuture<?> monitoringTask = this.fencedTopicMonitoringTask;
+        if (monitoringTask != null && !monitoringTask.isDone()) {
+            monitoringTask.cancel(false);
+        }
+    }
+
+    private synchronized void closeFencedTopicForcefully() {
+        if (isFenced) {
+            final int timeout = brokerService.pulsar().getConfiguration().getTopicFencingTimeoutSeconds();
+            if (isClosingOrDeleting) {
+                log.warn("[{}] Topic remained fenced for {} seconds and is already closed (pendingWriteOps: {})", topic,
+                        timeout, pendingWriteOps.get());
+            } else {
+                log.error("[{}] Topic remained fenced for {} seconds, so close it (pendingWriteOps: {})", topic,
+                        timeout, pendingWriteOps.get());
+                close();

Review comment:
       I removed "synchronized" because there was no clear reason to make this method synchronized.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on a change in pull request #8561: [broker] Close topics that remain fenced forcefully

Posted by GitBox <gi...@apache.org>.
sijie commented on a change in pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561#discussion_r524476333



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
##########
@@ -2384,6 +2387,40 @@ public boolean isSystemTopic() {
         return false;
     }
 
+    private synchronized void fence() {
+        isFenced = true;
+        ScheduledFuture<?> monitoringTask = this.fencedTopicMonitoringTask;
+        if (monitoringTask == null || monitoringTask.isDone()) {
+            final int timeout = brokerService.pulsar().getConfiguration().getTopicFencingTimeoutSeconds();
+            if (timeout > 0) {
+                this.fencedTopicMonitoringTask = brokerService.executor().schedule(this::closeFencedTopicForcefully,
+                        timeout, TimeUnit.SECONDS);
+            }
+        }
+    }
+
+    private synchronized void unfence() {
+        isFenced = false;
+        ScheduledFuture<?> monitoringTask = this.fencedTopicMonitoringTask;
+        if (monitoringTask != null && !monitoringTask.isDone()) {
+            monitoringTask.cancel(false);
+        }
+    }
+
+    private synchronized void closeFencedTopicForcefully() {
+        if (isFenced) {
+            final int timeout = brokerService.pulsar().getConfiguration().getTopicFencingTimeoutSeconds();
+            if (isClosingOrDeleting) {
+                log.warn("[{}] Topic remained fenced for {} seconds and is already closed (pendingWriteOps: {})", topic,
+                        timeout, pendingWriteOps.get());
+            } else {
+                log.error("[{}] Topic remained fenced for {} seconds, so close it (pendingWriteOps: {})", topic,
+                        timeout, pendingWriteOps.get());
+                close();

Review comment:
       Do we need to put `close` in the synchronized block? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8561: [broker] Close topics that remain fenced forcefully

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561#issuecomment-728673708


   @sijie Please help review this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Huanli-Meng commented on a change in pull request #8561: [broker] Close topics that remain fenced forcefully

Posted by GitBox <gi...@apache.org>.
Huanli-Meng commented on a change in pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561#discussion_r522996374



##########
File path: conf/broker.conf
##########
@@ -452,6 +452,10 @@ systemTopicEnabled=false
 # Please enable the system topic first.
 topicLevelPoliciesEnabled=false
 
+# If a topic remains fenced for this number of seconds, it will be closed forcefully.
+# If it is 0 or less, the fenced topic will not be closed.

Review comment:
       ```suggestion
   # If it is set to 0 or a negative number, the fenced topic will not be closed.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] massakam commented on a change in pull request #8561: [broker] Close topics that remain fenced forcefully

Posted by GitBox <gi...@apache.org>.
massakam commented on a change in pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561#discussion_r523021784



##########
File path: conf/broker.conf
##########
@@ -452,6 +452,10 @@ systemTopicEnabled=false
 # Please enable the system topic first.
 topicLevelPoliciesEnabled=false
 
+# If a topic remains fenced for this number of seconds, it will be closed forcefully.
+# If it is 0 or less, the fenced topic will not be closed.

Review comment:
       Fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Huanli-Meng commented on pull request #8561: [broker] Close topics that remain fenced forcefully

Posted by GitBox <gi...@apache.org>.
Huanli-Meng commented on pull request #8561:
URL: https://github.com/apache/pulsar/pull/8561#issuecomment-726806678


   Add a doc-required label as the broker.config and standalone.config file are updated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org