You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by zh...@apache.org on 2020/10/16 12:49:35 UTC

[pulsar] branch master updated: Fix stuck lookup operations when the broker is starting up (#8273)

This is an automated email from the ASF dual-hosted git repository.

zhaijia pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new b57c163  Fix stuck lookup operations when the broker is starting up (#8273)
b57c163 is described below

commit b57c1630e2478755c05a147bfaf11d9a723cd28e
Author: Matteo Merli <mm...@apache.org>
AuthorDate: Fri Oct 16 05:49:02 2020 -0700

    Fix stuck lookup operations when the broker is starting up (#8273)
    
    Motivation
    When the broker is starting up, it might start getting lookup requests before all the components of the service are fully initialized. In this particular case a lookup will fail on NPE because the leader election service is not ready yet (it gets instantiated after the broker service).
    
    This NPE causes a series of rippling effects:
    
    The future for the request hitting NPE are not completed
    They stay stale in the findingBundlesNotAuthoritative cache map forever
    All other lookup requests are piggy-backing on the first futures (but these will not complete)
    We reach the max number of pending lookup requests, after which the broker rejects new lookup
---
 .../apache/pulsar/broker/namespace/NamespaceService.java    | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java b/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
index c00e802..511ea12 100644
--- a/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
+++ b/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
@@ -30,6 +30,7 @@ import org.apache.pulsar.broker.PulsarServerException;
 import org.apache.pulsar.broker.PulsarService;
 import org.apache.pulsar.broker.ServiceConfiguration;
 import org.apache.pulsar.broker.admin.AdminResource;
+import org.apache.pulsar.broker.loadbalance.LeaderElectionService;
 import org.apache.pulsar.broker.loadbalance.LoadManager;
 import org.apache.pulsar.broker.loadbalance.ResourceUnit;
 import org.apache.pulsar.broker.lookup.LookupResult;
@@ -404,7 +405,17 @@ public class NamespaceService {
             return;
         }
         String candidateBroker = null;
-        boolean authoritativeRedirect = pulsar.getLeaderElectionService().isLeader();
+
+        LeaderElectionService les = pulsar.getLeaderElectionService();
+        if (les == null) {
+            // The leader election service was not initialized yet. This can happen because the broker service is
+            // initialized first and it might start receiving lookup requests before the leader election service is
+            // fully initialized.
+            lookupFuture.complete(Optional.empty());
+            return;
+        }
+
+        boolean authoritativeRedirect = les.isLeader();
 
         try {
             // check if this is Heartbeat or SLAMonitor namespace