You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/09/22 05:54:16 UTC

[GitHub] [pulsar] 4onni opened a new pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

4onni opened a new pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101


   <!--
   ### Contribution Checklist
     
     - Name the pull request in the form "[Issue XYZ][component] Title of the pull request", where *XYZ* should be replaced by the actual issue number.
       Skip *Issue XYZ* if there is no associated github issue for this pull request.
       Skip *component* if you are unsure about which is the best component. E.g. `[docs] Fix typo in produce method`.
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Each pull request should address only one issue, not mix up code from multiple issues.
     
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   Fixes #8093
   
   ### Motivation
   
   Client hangs forever when all brokers stop and then restart.
   There are several steps need to be finished before the broker can be fully started, as illustrated in the pseudo code below:
   
   ```
   PulsarService#start():
       broker.start(); // Step 1
       webService.start(); // Step 2
       leaderElectionService.start(); //Step 3
   ```
   If a lookup request gets in between Step 2 and Step 3, a NPE would be thrown, which will block all other coming requests from getting processed properly.
   
   ### Modifications
   
   Client can only connect to the broker after the election service started successfully
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   This change added tests and can be verified as follows:
    - * Added 2 test cases under `LeaderElectionServiceTest`
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-696725202


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on a change in pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on a change in pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#discussion_r492621290



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
##########
@@ -400,6 +400,11 @@ public boolean registerNamespace(String namespace, boolean ensureOwned) throws P
     private void searchForCandidateBroker(NamespaceBundle bundle,
                                           CompletableFuture<Optional<LookupResult>> lookupFuture,
                                           LookupOptions options) {
+        if( null == pulsar.getLeaderElectionService() || ! pulsar.getLeaderElectionService().isElected()) {
+            LOG.warn("The leader election has not yet been completed! NamespaceBundle[{}]", bundle);
+            lookupFuture.completeExceptionally(new IllegalStateException("The leader election has not yet been completed!"));

Review comment:
       As discussed with @4onni, this will resultin a client lookup timeout, and the client will relookup later, so it's not a problem here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-696657719


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie merged pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
sijie merged pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-696809912






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie merged pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
sijie merged pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-697297984


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai edited a comment on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai edited a comment on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-710025706


   already merged into branch-2.6 for 2.6.2 release
   https://github.com/apache/pulsar/commit/a4a363cb3eeff7f76466fe078efe6fb7c793d136 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-696629994






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-698134821


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-697249065


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-696790694


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-697299286


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-697274672


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-697515592


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-696629994


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on a change in pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on a change in pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#discussion_r492527221



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
##########
@@ -400,6 +400,11 @@ public boolean registerNamespace(String namespace, boolean ensureOwned) throws P
     private void searchForCandidateBroker(NamespaceBundle bundle,
                                           CompletableFuture<Optional<LookupResult>> lookupFuture,
                                           LookupOptions options) {
+        if( null == pulsar.getLeaderElectionService() || ! pulsar.getLeaderElectionService().isElected()) {
+            LOG.warn("The leader election has not yet been completed! NamespaceBundle[{}]", bundle);
+            lookupFuture.completeExceptionally(new IllegalStateException("The leader election has not yet been completed!"));

Review comment:
       Thanks for the great contribution, I think it's better to return a retryable exception to the client? So that the client can reconnect later. Does this make sense?

##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
##########
@@ -400,6 +400,11 @@ public boolean registerNamespace(String namespace, boolean ensureOwned) throws P
     private void searchForCandidateBroker(NamespaceBundle bundle,
                                           CompletableFuture<Optional<LookupResult>> lookupFuture,
                                           LookupOptions options) {
+        if( null == pulsar.getLeaderElectionService() || ! pulsar.getLeaderElectionService().isElected()) {
+            LOG.warn("The leader election has not yet been completed! NamespaceBundle[{}]", bundle);
+            lookupFuture.completeExceptionally(new IllegalStateException("The leader election has not yet been completed!"));

Review comment:
       As discussed with @4onni, this will resultin a client lookup timeout, and the client will relookup later, so it's not a problem here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-710025706


   already merged into branch-2.6 for 2.6.2 release


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-698134821


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-696809912


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai commented on pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
jiazhai commented on pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#issuecomment-697049187


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on a change in pull request #8101: [Issue 8093]Fix client lookup hangs when broker restarts

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on a change in pull request #8101:
URL: https://github.com/apache/pulsar/pull/8101#discussion_r492527221



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
##########
@@ -400,6 +400,11 @@ public boolean registerNamespace(String namespace, boolean ensureOwned) throws P
     private void searchForCandidateBroker(NamespaceBundle bundle,
                                           CompletableFuture<Optional<LookupResult>> lookupFuture,
                                           LookupOptions options) {
+        if( null == pulsar.getLeaderElectionService() || ! pulsar.getLeaderElectionService().isElected()) {
+            LOG.warn("The leader election has not yet been completed! NamespaceBundle[{}]", bundle);
+            lookupFuture.completeExceptionally(new IllegalStateException("The leader election has not yet been completed!"));

Review comment:
       Thanks for the great contribution, I think it's better to return a retryable exception to the client? So that the client can reconnect later. Does this make sense?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org