You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/08/10 12:09:12 UTC

[GitHub] [pulsar] ivankelly opened a new pull request #11627: PIP-91: Separate lookup timeout from operation timeout

ivankelly opened a new pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627


   This patch contains a number of changes.
   
   TooManyRequests is retried for partition metadata and lookups
   
   Lookup timeout configuration has been added. By default it matches
   operation timeout.
   
   Partition metadata timeout calculation has been fixed to calculate
   the elapsed time correctly.
   
   Small refactor on broker construction to allow a mocked ServerCnx
   implementation for testing. Unfortunately, the test takes over 50
   seconds, but this is unavoidable due to the fact that we're working
   with timeouts here.
   
   PulsarClientExceptions have been reworked to contain more
   context (remote/local/reqid) and any previous exceptions which may
   have occurred triggering retries. The previous exceptions must be
   manually recorded, so this only applies to lookups on the consumer
   side for now.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688077951



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -83,6 +86,49 @@ public PulsarClientException(String msg, Throwable t) {
         super(msg, t);
     }
 
+    /**
+     * Add a list of previous exception which occurred for the same operation
+     * and have been retried.
+     *
+     * @param previous A collection of throwables that triggered retries
+     */
+    public void setPreviousExceptions(Collection<Throwable> previous) {
+        this.previous = previous;
+    }
+
+    /**
+     * Get the collection of previous exceptions which have caused retries
+     * for this operation.
+     *
+     * @return a collection of exception, ordered as they occurred
+     */
+    public Collection<Throwable> getPreviousExceptions() {
+        return this.previous;
+    }
+
+    @Override
+    public String toString() {

Review comment:
       Looking through the code, we do log exceptions returned from the broker.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688335842



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -83,6 +86,49 @@ public PulsarClientException(String msg, Throwable t) {
         super(msg, t);
     }
 
+    /**
+     * Add a list of previous exception which occurred for the same operation
+     * and have been retried.
+     *
+     * @param previous A collection of throwables that triggered retries
+     */
+    public void setPreviousExceptions(Collection<Throwable> previous) {
+        this.previous = previous;
+    }
+
+    /**
+     * Get the collection of previous exceptions which have caused retries
+     * for this operation.
+     *
+     * @return a collection of exception, ordered as they occurred
+     */
+    public Collection<Throwable> getPreviousExceptions() {
+        return this.previous;
+    }
+
+    @Override
+    public String toString() {

Review comment:
       What I meant is, we don't print the exception with the previous exceptions attached every time we log. We only print that when at the point that we're about to complete the subscribeFuture or producerCreatedFuture with an exception. Which is when the exception gets passed to the client. For me, the logging of that exception is incidental. What I want is for the client code to get an exception that has context about the retries. 
   
   Take for example the case of a customer who has a flink pipeline, and they get a TooManyRequestsException. They take a screenshot of the exception in the flink dashboard and send it to us. I want all the information to be in that screenshot, and not have to ask them to dig around in flink logs to get it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r686083771



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -957,77 +1003,110 @@ public static PulsarClientException unwrap(Throwable t) {
         // site
         Throwable cause = t.getCause();
         String msg = cause.getMessage();
+        PulsarClientException newException = null;
         if (cause instanceof TimeoutException) {
-            return new TimeoutException(msg);
+            newException = new TimeoutException(msg);
         } else if (cause instanceof InvalidConfigurationException) {
-            return new InvalidConfigurationException(msg);
+            newException = new InvalidConfigurationException(msg);
         } else if (cause instanceof AuthenticationException) {
-            return new AuthenticationException(msg);
+            newException = new AuthenticationException(msg);
         } else if (cause instanceof IncompatibleSchemaException) {
-            return new IncompatibleSchemaException(msg);
+            newException = new IncompatibleSchemaException(msg);
         } else if (cause instanceof TooManyRequestsException) {
-            return new TooManyRequestsException(msg);
+            newException = new TooManyRequestsException(msg);
         } else if (cause instanceof LookupException) {
-            return new LookupException(msg);
+            newException = new LookupException(msg);
         } else if (cause instanceof ConnectException) {
-            return new ConnectException(msg);
+            newException = new ConnectException(msg);
         } else if (cause instanceof AlreadyClosedException) {
-            return new AlreadyClosedException(msg);
+            newException = new AlreadyClosedException(msg);
         } else if (cause instanceof TopicTerminatedException) {
-            return new TopicTerminatedException(msg);
+            newException = new TopicTerminatedException(msg);
         } else if (cause instanceof AuthorizationException) {
-            return new AuthorizationException(msg);
+            newException = new AuthorizationException(msg);
         } else if (cause instanceof GettingAuthenticationDataException) {
-            return new GettingAuthenticationDataException(msg);
+            newException = new GettingAuthenticationDataException(msg);
         } else if (cause instanceof UnsupportedAuthenticationException) {
-            return new UnsupportedAuthenticationException(msg);
+            newException = new UnsupportedAuthenticationException(msg);
         } else if (cause instanceof BrokerPersistenceException) {
-            return new BrokerPersistenceException(msg);
+            newException = new BrokerPersistenceException(msg);
         } else if (cause instanceof BrokerMetadataException) {
-            return new BrokerMetadataException(msg);
+            newException = new BrokerMetadataException(msg);
         } else if (cause instanceof ProducerBusyException) {
-            return new ProducerBusyException(msg);
+            newException = new ProducerBusyException(msg);
         } else if (cause instanceof ConsumerBusyException) {
-            return new ConsumerBusyException(msg);
+            newException = new ConsumerBusyException(msg);
         } else if (cause instanceof NotConnectedException) {
-            return new NotConnectedException();
+            newException = new NotConnectedException();
         } else if (cause instanceof InvalidMessageException) {
-            return new InvalidMessageException(msg);
+            newException = new InvalidMessageException(msg);
         } else if (cause instanceof InvalidTopicNameException) {
-            return new InvalidTopicNameException(msg);
+            newException = new InvalidTopicNameException(msg);
         } else if (cause instanceof NotSupportedException) {
-            return new NotSupportedException(msg);
+            newException = new NotSupportedException(msg);
         } else if (cause instanceof NotAllowedException) {
-            return new NotAllowedException(msg);
+            newException = new NotAllowedException(msg);
         } else if (cause instanceof ProducerQueueIsFullError) {
-            return new ProducerQueueIsFullError(msg);
+            newException = new ProducerQueueIsFullError(msg);
         } else if (cause instanceof ProducerBlockedQuotaExceededError) {
-            return new ProducerBlockedQuotaExceededError(msg);
+            newException = new ProducerBlockedQuotaExceededError(msg);
         } else if (cause instanceof ProducerBlockedQuotaExceededException) {
-            return new ProducerBlockedQuotaExceededException(msg);
+            newException = new ProducerBlockedQuotaExceededException(msg);
         } else if (cause instanceof ChecksumException) {
-            return new ChecksumException(msg);
+            newException = new ChecksumException(msg);
         } else if (cause instanceof CryptoException) {
-            return new CryptoException(msg);
+            newException = new CryptoException(msg);
         } else if (cause instanceof ConsumerAssignException) {
-            return new ConsumerAssignException(msg);
+            newException = new ConsumerAssignException(msg);
         } else if (cause instanceof MessageAcknowledgeException) {
-            return new MessageAcknowledgeException(msg);
+            newException = new MessageAcknowledgeException(msg);
         } else if (cause instanceof TransactionConflictException) {
-            return new TransactionConflictException(msg);
+            newException = new TransactionConflictException(msg);
         } else if (cause instanceof TopicDoesNotExistException) {
-            return new TopicDoesNotExistException(msg);
+            newException = new TopicDoesNotExistException(msg);
         } else if (cause instanceof ProducerFencedException) {
-            return new ProducerFencedException(msg);
+            newException = new ProducerFencedException(msg);
         } else if (cause instanceof MemoryBufferIsFullError) {
-            return new MemoryBufferIsFullError(msg);
+            newException = new MemoryBufferIsFullError(msg);
         } else if (cause instanceof NotFoundException) {
-            return new NotFoundException(msg);
+            newException = new NotFoundException(msg);
         } else {
-            return new PulsarClientException(t);
+            newException = new PulsarClientException(t);
         }
+
+        Collection<Throwable> previousExceptions = getPreviousExceptions(t);
+        if (t != null) {

Review comment:
       it should be checking whether previousExceptions is null




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r686084378



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java
##########
@@ -327,6 +333,14 @@ public boolean isUseTls() {
         return false;
     }
 
+    public long getLookupTimeoutMs() {
+        if (lookupTimeoutMs >= 0) {
+            return lookupTimeoutMs;
+        } else {
+            return operationTimeoutMs;
+        }
+    }

Review comment:
       Add extra docs where? ClientBuilder already documents that lookupTimeout defaults to operationTimeout




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r687296786



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java
##########
@@ -327,6 +333,14 @@ public boolean isUseTls() {
         return false;
     }
 
+    public long getLookupTimeoutMs() {
+        if (lookupTimeoutMs >= 0) {
+            return lookupTimeoutMs;
+        } else {
+            return operationTimeoutMs;
+        }
+    }

Review comment:
       Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli merged pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
eolivelli merged pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688076935



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -83,6 +86,49 @@ public PulsarClientException(String msg, Throwable t) {
         super(msg, t);
     }
 
+    /**
+     * Add a list of previous exception which occurred for the same operation
+     * and have been retried.
+     *
+     * @param previous A collection of throwables that triggered retries
+     */
+    public void setPreviousExceptions(Collection<Throwable> previous) {
+        this.previous = previous;
+    }
+
+    /**
+     * Get the collection of previous exceptions which have caused retries
+     * for this operation.
+     *
+     * @return a collection of exception, ordered as they occurred
+     */
+    public Collection<Throwable> getPreviousExceptions() {
+        return this.previous;
+    }
+
+    @Override
+    public String toString() {

Review comment:
       @ivankelly aren't we printing lookup errors here:
   https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionHandler.java#L81 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Anonymitaet commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-897250930


   Thanks for your contribution. For this PR, do we need to update docs?
   
   (The [PR template contains info about doc](https://github.com/apache/pulsar/blob/master/.github/PULL_REQUEST_TEMPLATE.md#documentation), which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-900168385


   > @eolivelli this has 4 approvals and no changes requested. IMO it's ready to merge.
   
   agreed. merging now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r686103662



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -957,77 +1003,110 @@ public static PulsarClientException unwrap(Throwable t) {
         // site
         Throwable cause = t.getCause();
         String msg = cause.getMessage();
+        PulsarClientException newException = null;
         if (cause instanceof TimeoutException) {
-            return new TimeoutException(msg);
+            newException = new TimeoutException(msg);
         } else if (cause instanceof InvalidConfigurationException) {
-            return new InvalidConfigurationException(msg);
+            newException = new InvalidConfigurationException(msg);
         } else if (cause instanceof AuthenticationException) {
-            return new AuthenticationException(msg);
+            newException = new AuthenticationException(msg);
         } else if (cause instanceof IncompatibleSchemaException) {
-            return new IncompatibleSchemaException(msg);
+            newException = new IncompatibleSchemaException(msg);
         } else if (cause instanceof TooManyRequestsException) {
-            return new TooManyRequestsException(msg);
+            newException = new TooManyRequestsException(msg);
         } else if (cause instanceof LookupException) {
-            return new LookupException(msg);
+            newException = new LookupException(msg);
         } else if (cause instanceof ConnectException) {
-            return new ConnectException(msg);
+            newException = new ConnectException(msg);
         } else if (cause instanceof AlreadyClosedException) {
-            return new AlreadyClosedException(msg);
+            newException = new AlreadyClosedException(msg);
         } else if (cause instanceof TopicTerminatedException) {
-            return new TopicTerminatedException(msg);
+            newException = new TopicTerminatedException(msg);
         } else if (cause instanceof AuthorizationException) {
-            return new AuthorizationException(msg);
+            newException = new AuthorizationException(msg);
         } else if (cause instanceof GettingAuthenticationDataException) {
-            return new GettingAuthenticationDataException(msg);
+            newException = new GettingAuthenticationDataException(msg);
         } else if (cause instanceof UnsupportedAuthenticationException) {
-            return new UnsupportedAuthenticationException(msg);
+            newException = new UnsupportedAuthenticationException(msg);
         } else if (cause instanceof BrokerPersistenceException) {
-            return new BrokerPersistenceException(msg);
+            newException = new BrokerPersistenceException(msg);
         } else if (cause instanceof BrokerMetadataException) {
-            return new BrokerMetadataException(msg);
+            newException = new BrokerMetadataException(msg);
         } else if (cause instanceof ProducerBusyException) {
-            return new ProducerBusyException(msg);
+            newException = new ProducerBusyException(msg);
         } else if (cause instanceof ConsumerBusyException) {
-            return new ConsumerBusyException(msg);
+            newException = new ConsumerBusyException(msg);
         } else if (cause instanceof NotConnectedException) {
-            return new NotConnectedException();
+            newException = new NotConnectedException();
         } else if (cause instanceof InvalidMessageException) {
-            return new InvalidMessageException(msg);
+            newException = new InvalidMessageException(msg);
         } else if (cause instanceof InvalidTopicNameException) {
-            return new InvalidTopicNameException(msg);
+            newException = new InvalidTopicNameException(msg);
         } else if (cause instanceof NotSupportedException) {
-            return new NotSupportedException(msg);
+            newException = new NotSupportedException(msg);
         } else if (cause instanceof NotAllowedException) {
-            return new NotAllowedException(msg);
+            newException = new NotAllowedException(msg);
         } else if (cause instanceof ProducerQueueIsFullError) {
-            return new ProducerQueueIsFullError(msg);
+            newException = new ProducerQueueIsFullError(msg);
         } else if (cause instanceof ProducerBlockedQuotaExceededError) {
-            return new ProducerBlockedQuotaExceededError(msg);
+            newException = new ProducerBlockedQuotaExceededError(msg);
         } else if (cause instanceof ProducerBlockedQuotaExceededException) {
-            return new ProducerBlockedQuotaExceededException(msg);
+            newException = new ProducerBlockedQuotaExceededException(msg);
         } else if (cause instanceof ChecksumException) {
-            return new ChecksumException(msg);
+            newException = new ChecksumException(msg);
         } else if (cause instanceof CryptoException) {
-            return new CryptoException(msg);
+            newException = new CryptoException(msg);
         } else if (cause instanceof ConsumerAssignException) {
-            return new ConsumerAssignException(msg);
+            newException = new ConsumerAssignException(msg);
         } else if (cause instanceof MessageAcknowledgeException) {
-            return new MessageAcknowledgeException(msg);
+            newException = new MessageAcknowledgeException(msg);
         } else if (cause instanceof TransactionConflictException) {
-            return new TransactionConflictException(msg);
+            newException = new TransactionConflictException(msg);
         } else if (cause instanceof TopicDoesNotExistException) {
-            return new TopicDoesNotExistException(msg);
+            newException = new TopicDoesNotExistException(msg);
         } else if (cause instanceof ProducerFencedException) {
-            return new ProducerFencedException(msg);
+            newException = new ProducerFencedException(msg);
         } else if (cause instanceof MemoryBufferIsFullError) {
-            return new MemoryBufferIsFullError(msg);
+            newException = new MemoryBufferIsFullError(msg);
         } else if (cause instanceof NotFoundException) {
-            return new NotFoundException(msg);
+            newException = new NotFoundException(msg);
         } else {
-            return new PulsarClientException(t);
+            newException = new PulsarClientException(t);
         }
+
+        Collection<Throwable> previousExceptions = getPreviousExceptions(t);
+        if (t != null) {

Review comment:
       Agree.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r686890179



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java
##########
@@ -327,6 +333,14 @@ public boolean isUseTls() {
         return false;
     }
 
+    public long getLookupTimeoutMs() {
+        if (lookupTimeoutMs >= 0) {
+            return lookupTimeoutMs;
+        } else {
+            return operationTimeoutMs;
+        }
+    }

Review comment:
       @BewareMyPower ^




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r687408553



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
##########
@@ -883,16 +887,21 @@ private void sendFlowPermitsToBroker(ClientCnx cnx, int numMessages) {
     public void connectionFailed(PulsarClientException exception) {
         boolean nonRetriableError = !PulsarClientException.isRetriableError(exception);
         boolean timeout = System.currentTimeMillis() > subscribeTimeout;
-        if ((nonRetriableError || timeout) && subscribeFuture.completeExceptionally(exception)) {
-            setState(State.Failed);
-            if (nonRetriableError) {
-                log.info("[{}] Consumer creation failed for consumer {} with unretriableError {}", topic, consumerId, exception);
-            } else {
-                log.info("[{}] Consumer creation failed for consumer {} after timeout", topic, consumerId);
+        if (nonRetriableError || timeout) {
+            exception.setPreviousExceptions(previousExceptions);
+            if (subscribeFuture.completeExceptionally(exception)) {
+                setState(State.Failed);
+                if (nonRetriableError) {
+                    log.info("[{}] Consumer creation failed for consumer {} with unretriableError {}", topic, consumerId, exception);
+                } else {
+                    log.info("[{}] Consumer creation failed for consumer {} after timeout", topic, consumerId);

Review comment:
       I think it will still be useful to log the exception even if we timed out 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r687476062



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -83,6 +86,49 @@ public PulsarClientException(String msg, Throwable t) {
         super(msg, t);
     }
 
+    /**
+     * Add a list of previous exception which occurred for the same operation
+     * and have been retried.
+     *
+     * @param previous A collection of throwables that triggered retries
+     */
+    public void setPreviousExceptions(Collection<Throwable> previous) {
+        this.previous = previous;
+    }
+
+    /**
+     * Get the collection of previous exceptions which have caused retries
+     * for this operation.
+     *
+     * @return a collection of exception, ordered as they occurred
+     */
+    public Collection<Throwable> getPreviousExceptions() {
+        return this.previous;
+    }
+
+    @Override
+    public String toString() {

Review comment:
       @jerrypeng we don't print it when we log. the previous exceptions only get attached when the exception is propagated to the client. It's useful because it gives you more info to correlate on the broker side. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-900176730


   @eolivelli thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688064193



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -83,6 +86,49 @@ public PulsarClientException(String msg, Throwable t) {
         super(msg, t);
     }
 
+    /**
+     * Add a list of previous exception which occurred for the same operation
+     * and have been retried.
+     *
+     * @param previous A collection of throwables that triggered retries
+     */
+    public void setPreviousExceptions(Collection<Throwable> previous) {
+        this.previous = previous;
+    }
+
+    /**
+     * Get the collection of previous exceptions which have caused retries
+     * for this operation.
+     *
+     * @return a collection of exception, ordered as they occurred
+     */
+    public Collection<Throwable> getPreviousExceptions() {
+        return this.previous;
+    }
+
+    @Override
+    public String toString() {

Review comment:
       @ivankelly so when will be print out the previous exceptions?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688085063



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
##########
@@ -243,7 +245,7 @@ protected ConsumerImpl(PulsarClientImpl client, String topic, ConsumerConfigurat
         this.initialStartMessageId = this.startMessageId;
         this.startMessageRollbackDurationInSec = startMessageRollbackDurationInSec;
         AVAILABLE_PERMITS_UPDATER.set(this, 0);
-        this.subscribeTimeout = System.currentTimeMillis() + client.getConfiguration().getOperationTimeoutMs();
+        this.subscribeTimeout = System.currentTimeMillis() + client.getConfiguration().getLookupTimeoutMs();

Review comment:
       The time to subscribe will include time to do a lookup + the time to create a connection (if the connection to the broker is not established yet). However, with out current code we are including the connection establishment time within our lookup time.  This makes this timeout here confusing and hard to reason about as it may or may not include the time to establish a connection. Also establishing a connection has its own timeout which defaults to 10 seconds.  I think we should clearly separate the two timeouts so one is not just overlapping with the other and we can clearly understand if subscribe failed because of a lookup timeout or a connection timeout.
   
   Same for producers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r687477622



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
##########
@@ -883,16 +887,21 @@ private void sendFlowPermitsToBroker(ClientCnx cnx, int numMessages) {
     public void connectionFailed(PulsarClientException exception) {
         boolean nonRetriableError = !PulsarClientException.isRetriableError(exception);
         boolean timeout = System.currentTimeMillis() > subscribeTimeout;
-        if ((nonRetriableError || timeout) && subscribeFuture.completeExceptionally(exception)) {
-            setState(State.Failed);
-            if (nonRetriableError) {
-                log.info("[{}] Consumer creation failed for consumer {} with unretriableError {}", topic, consumerId, exception);
-            } else {
-                log.info("[{}] Consumer creation failed for consumer {} after timeout", topic, consumerId);
+        if (nonRetriableError || timeout) {
+            exception.setPreviousExceptions(previousExceptions);
+            if (subscribeFuture.completeExceptionally(exception)) {
+                setState(State.Failed);
+                if (nonRetriableError) {
+                    log.info("[{}] Consumer creation failed for consumer {} with unretriableError {}", topic, consumerId, exception);
+                } else {
+                    log.info("[{}] Consumer creation failed for consumer {} after timeout", topic, consumerId);

Review comment:
       I didn't change this code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r687407439



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -83,6 +86,49 @@ public PulsarClientException(String msg, Throwable t) {
         super(msg, t);
     }
 
+    /**
+     * Add a list of previous exception which occurred for the same operation
+     * and have been retried.
+     *
+     * @param previous A collection of throwables that triggered retries
+     */
+    public void setPreviousExceptions(Collection<Throwable> previous) {
+        this.previous = previous;
+    }
+
+    /**
+     * Get the collection of previous exceptions which have caused retries
+     * for this operation.
+     *
+     * @return a collection of exception, ordered as they occurred
+     */
+    public Collection<Throwable> getPreviousExceptions() {
+        return this.previous;
+    }
+
+    @Override
+    public String toString() {

Review comment:
       Why do we need to print out the previous encountered exceptions every time we log an exception?  We already log every exception in the client, can't we just search the logs for the history?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r686919013



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java
##########
@@ -327,6 +333,14 @@ public boolean isUseTls() {
         return false;
     }
 
+    public long getLookupTimeoutMs() {
+        if (lookupTimeoutMs >= 0) {
+            return lookupTimeoutMs;
+        } else {
+            return operationTimeoutMs;
+        }
+    }

Review comment:
       > ClientBuilder already documents that lookupTimeout defaults to operationTimeout
   
   I have not found it, could you give a link?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rdhabalia commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
rdhabalia commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-896643612


   The history behind introducing TooManyRequest error is to handle backpressure for zookeeper by throttling a large number of concurrent topics loading during broker cold restart. Therefore, pulsar has lookup throttling at both client and server-side that slows down lookup because lookup ultimately triggers topic loading at server side. So, when a client sees TooManyRequest errors, the client should retry to perform this operation and the client will eventually reconnect to the broker, TooManyRequest can not harm the broker because broker already has a safeguard to reject the flood of the requests. 
   I am not sure what problem https://github.com/apache/pulsar/pull/6584 PR tries to solve but it should not solve it by making TooManyRequest non-retriable. TooManyRequest is a retriable error and the client should retry. Also, it should definitely not close the producer/consumer due to this error otherwise it can bring down the entire application which depends on the availability of the pulsar client entities.Pulsar lookup is an operation similar to other operations such as: connect, publish, subscribe, etc. So, I don’t think it needs special treatment with a separate timeout config and we can avoid the complexity introduced in this PR that caches and depends on the previously seen exception for lookup retry. Anyways, removing TooManyRequest from the non-retriable error list will simplify the client behavior and we can avoid the complexity of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-897428763


   > Thanks for your contribution. For this PR, do we need to update docs?
   > 
   > (The [PR template contains info about doc](https://github.com/apache/pulsar/blob/master/.github/PULL_REQUEST_TEMPLATE.md#documentation), which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)
   
   @Anonymitaet there's javadoc for the new configuration option. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-896001749


   Some tests are failing. At least one of the failures is legit, so I'll look into it and post an update


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688379620



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
##########
@@ -243,7 +245,7 @@ protected ConsumerImpl(PulsarClientImpl client, String topic, ConsumerConfigurat
         this.initialStartMessageId = this.startMessageId;
         this.startMessageRollbackDurationInSec = startMessageRollbackDurationInSec;
         AVAILABLE_PERMITS_UPDATER.set(this, 0);
-        this.subscribeTimeout = System.currentTimeMillis() + client.getConfiguration().getOperationTimeoutMs();
+        this.subscribeTimeout = System.currentTimeMillis() + client.getConfiguration().getLookupTimeoutMs();

Review comment:
       It also includes the time to do CommandSubscribe and CommandPublish. 
   
   Separating the timeout to do lookup from the time to establish the correct connection is a major rework of how timeouts work. The lookup timeout and retry is handled in ConsumerImpl and ProducerImpl, and these only get signals via connectionFailed and connectionOpen callbacks. So separate it out, we'd need to refactor how the Impls get a connection. Currently it goes from Impl->ConnectionHandler->PulsarClientImpl->LookupService. I don't think it's worth it. It's already clear if subscribe failed due to lookup timeout or a connection timeout. The exception returned is different, PulsarClientException.TimeoutException for the former, netty ConnectTimeoutException for the latter. If you want to know which node you failed to connect to, it's there in the exception message.
   ```
   java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.CompletionException: io.netty.channel.ConnectTimeoutException: connection timed out: /192.168.1.34:5432
   ```
   
   w.r.t. including the CommandSubscribe and CommandProducer, I can change this, but it would create a behavioral change by default as then the operationTimeout for these commands only starts counting down after lookup has succeeded. i.e. the whole operation could take twice as long. I guess this isn't a major issue though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688085063



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
##########
@@ -243,7 +245,7 @@ protected ConsumerImpl(PulsarClientImpl client, String topic, ConsumerConfigurat
         this.initialStartMessageId = this.startMessageId;
         this.startMessageRollbackDurationInSec = startMessageRollbackDurationInSec;
         AVAILABLE_PERMITS_UPDATER.set(this, 0);
-        this.subscribeTimeout = System.currentTimeMillis() + client.getConfiguration().getOperationTimeoutMs();
+        this.subscribeTimeout = System.currentTimeMillis() + client.getConfiguration().getLookupTimeoutMs();

Review comment:
       The time to subscribe will include time to do a lookup + the time to create a connection (if the connection to the broker is not established yet). However, with out current code we are including the connection establishment time without our lookup time.  This makes this timeout here confusing and hard to reason about as it may or may not include the time to establish a connection. Also establishing a connection has its own timeout which defaults to 10 seconds.  I think we should clearly separate the two timeouts so one is not just overlapping with the other.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r687047625



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java
##########
@@ -327,6 +333,14 @@ public boolean isUseTls() {
         return false;
     }
 
+    public long getLookupTimeoutMs() {
+        if (lookupTimeoutMs >= 0) {
+            return lookupTimeoutMs;
+        } else {
+            return operationTimeoutMs;
+        }
+    }

Review comment:
       @BewareMyPower https://github.com/apache/pulsar/pull/11627/files#diff-33219b2b08e58d5841feb88fdae064889f2477fd77564044ff4bb4e8fc24f1f9R220




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r687941161



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java
##########
@@ -327,6 +333,14 @@ public boolean isUseTls() {
         return false;
     }
 
+    public long getLookupTimeoutMs() {
+        if (lookupTimeoutMs >= 0) {
+            return lookupTimeoutMs;
+        } else {
+            return operationTimeoutMs;
+        }
+    }

Review comment:
       @BewareMyPower you're still marked as "Change requested"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688085063



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
##########
@@ -243,7 +245,7 @@ protected ConsumerImpl(PulsarClientImpl client, String topic, ConsumerConfigurat
         this.initialStartMessageId = this.startMessageId;
         this.startMessageRollbackDurationInSec = startMessageRollbackDurationInSec;
         AVAILABLE_PERMITS_UPDATER.set(this, 0);
-        this.subscribeTimeout = System.currentTimeMillis() + client.getConfiguration().getOperationTimeoutMs();
+        this.subscribeTimeout = System.currentTimeMillis() + client.getConfiguration().getLookupTimeoutMs();

Review comment:
       The time to subscribe will include time to do a lookup + the time to create a connection (if the connection to the broker is not established yet). However, with out current code we are including the connection establishment time within our lookup time.  This makes this timeout here confusing and hard to reason about as it may or may not include the time to establish a connection. Also establishing a connection has its own timeout which defaults to 10 seconds.  I think we should clearly separate the two timeouts so one is not just overlapping with the other.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-900132986


   @eolivelli this has 4 approvals and no changes requested. IMO it's ready to merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-899365068


   IIUC @rdhabalia left some comments on the ML
   
   @rdhabalia do you mind to official write your position about this PR ?
   
   it also looks like that @jerrypeng initially approved the PR and then added more comments, if you can @jerrypeng please add your review as well 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r686023466



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java
##########
@@ -327,6 +333,14 @@ public boolean isUseTls() {
         return false;
     }
 
+    public long getLookupTimeoutMs() {
+        if (lookupTimeoutMs >= 0) {
+            return lookupTimeoutMs;
+        } else {
+            return operationTimeoutMs;
+        }
+    }

Review comment:
       It's better to add extra docs to note this behavior since the default `lookupTimeoutMs` is -1, users may think it means the lookup request never time out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-896815950


   @rdhabalia 
   There are some similarities between CommandProducer, CommandSubscribe and CommandLookup in that they are all control plane operations, but there are also important differences. 
   CommandLookup has no side effects. Multiple CommandLookup requests will not interfere with each other, while multiple CommandProducer/CommandSubscribe will. 
   CommandLookup can be served by any broker. CommandProducer/CommandSubscribe can only be served by the owner of the topic.
   With timeouts, you can have how long I'm willing to wait for a node to respond, or how long I'm willing to wait for a operation to complete. For producer/subscribe, these are the same, as any retry would hit the same node and one of the requests would necessarily have to fail. This is not the case for partition metadata/lookup, as another node can be tried.
   We even already consider lookup type operations to be different in the code, by the fact that we have a http lookup service and a binary lookup service.
   
   There is very little complexity added to separate the timeout. The complexity I think you are referring to is recordkeeping so that if an exception is thrown, it contains information about previous failures, not just the last failure. Business logic never actually looks at the List of exceptions.
   The change for the new timeout is just adding the config and changing 2-3 lines.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688340601



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
##########
@@ -883,16 +887,21 @@ private void sendFlowPermitsToBroker(ClientCnx cnx, int numMessages) {
     public void connectionFailed(PulsarClientException exception) {
         boolean nonRetriableError = !PulsarClientException.isRetriableError(exception);
         boolean timeout = System.currentTimeMillis() > subscribeTimeout;
-        if ((nonRetriableError || timeout) && subscribeFuture.completeExceptionally(exception)) {
-            setState(State.Failed);
-            if (nonRetriableError) {
-                log.info("[{}] Consumer creation failed for consumer {} with unretriableError {}", topic, consumerId, exception);
-            } else {
-                log.info("[{}] Consumer creation failed for consumer {} after timeout", topic, consumerId);
+        if (nonRetriableError || timeout) {
+            exception.setPreviousExceptions(previousExceptions);
+            if (subscribeFuture.completeExceptionally(exception)) {
+                setState(State.Failed);
+                if (nonRetriableError) {
+                    log.info("[{}] Consumer creation failed for consumer {} with unretriableError {}", topic, consumerId, exception);
+                } else {
+                    log.info("[{}] Consumer creation failed for consumer {} after timeout", topic, consumerId);

Review comment:
       it wouldn't add any more information. TimeoutExceptions all come from https://github.com/apache/pulsar/blob/05827aeb4b0aa4a1b09a7a122f00a65012bf53a1/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ClientCnx.java#L1149
   So the timeout exception message and stack would be exactly the same every time.
   
   Also, note that with the non-timeout exception it's not printing the stacktrace.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r686026221



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/PulsarClientException.java
##########
@@ -957,77 +1003,110 @@ public static PulsarClientException unwrap(Throwable t) {
         // site
         Throwable cause = t.getCause();
         String msg = cause.getMessage();
+        PulsarClientException newException = null;
         if (cause instanceof TimeoutException) {
-            return new TimeoutException(msg);
+            newException = new TimeoutException(msg);
         } else if (cause instanceof InvalidConfigurationException) {
-            return new InvalidConfigurationException(msg);
+            newException = new InvalidConfigurationException(msg);
         } else if (cause instanceof AuthenticationException) {
-            return new AuthenticationException(msg);
+            newException = new AuthenticationException(msg);
         } else if (cause instanceof IncompatibleSchemaException) {
-            return new IncompatibleSchemaException(msg);
+            newException = new IncompatibleSchemaException(msg);
         } else if (cause instanceof TooManyRequestsException) {
-            return new TooManyRequestsException(msg);
+            newException = new TooManyRequestsException(msg);
         } else if (cause instanceof LookupException) {
-            return new LookupException(msg);
+            newException = new LookupException(msg);
         } else if (cause instanceof ConnectException) {
-            return new ConnectException(msg);
+            newException = new ConnectException(msg);
         } else if (cause instanceof AlreadyClosedException) {
-            return new AlreadyClosedException(msg);
+            newException = new AlreadyClosedException(msg);
         } else if (cause instanceof TopicTerminatedException) {
-            return new TopicTerminatedException(msg);
+            newException = new TopicTerminatedException(msg);
         } else if (cause instanceof AuthorizationException) {
-            return new AuthorizationException(msg);
+            newException = new AuthorizationException(msg);
         } else if (cause instanceof GettingAuthenticationDataException) {
-            return new GettingAuthenticationDataException(msg);
+            newException = new GettingAuthenticationDataException(msg);
         } else if (cause instanceof UnsupportedAuthenticationException) {
-            return new UnsupportedAuthenticationException(msg);
+            newException = new UnsupportedAuthenticationException(msg);
         } else if (cause instanceof BrokerPersistenceException) {
-            return new BrokerPersistenceException(msg);
+            newException = new BrokerPersistenceException(msg);
         } else if (cause instanceof BrokerMetadataException) {
-            return new BrokerMetadataException(msg);
+            newException = new BrokerMetadataException(msg);
         } else if (cause instanceof ProducerBusyException) {
-            return new ProducerBusyException(msg);
+            newException = new ProducerBusyException(msg);
         } else if (cause instanceof ConsumerBusyException) {
-            return new ConsumerBusyException(msg);
+            newException = new ConsumerBusyException(msg);
         } else if (cause instanceof NotConnectedException) {
-            return new NotConnectedException();
+            newException = new NotConnectedException();
         } else if (cause instanceof InvalidMessageException) {
-            return new InvalidMessageException(msg);
+            newException = new InvalidMessageException(msg);
         } else if (cause instanceof InvalidTopicNameException) {
-            return new InvalidTopicNameException(msg);
+            newException = new InvalidTopicNameException(msg);
         } else if (cause instanceof NotSupportedException) {
-            return new NotSupportedException(msg);
+            newException = new NotSupportedException(msg);
         } else if (cause instanceof NotAllowedException) {
-            return new NotAllowedException(msg);
+            newException = new NotAllowedException(msg);
         } else if (cause instanceof ProducerQueueIsFullError) {
-            return new ProducerQueueIsFullError(msg);
+            newException = new ProducerQueueIsFullError(msg);
         } else if (cause instanceof ProducerBlockedQuotaExceededError) {
-            return new ProducerBlockedQuotaExceededError(msg);
+            newException = new ProducerBlockedQuotaExceededError(msg);
         } else if (cause instanceof ProducerBlockedQuotaExceededException) {
-            return new ProducerBlockedQuotaExceededException(msg);
+            newException = new ProducerBlockedQuotaExceededException(msg);
         } else if (cause instanceof ChecksumException) {
-            return new ChecksumException(msg);
+            newException = new ChecksumException(msg);
         } else if (cause instanceof CryptoException) {
-            return new CryptoException(msg);
+            newException = new CryptoException(msg);
         } else if (cause instanceof ConsumerAssignException) {
-            return new ConsumerAssignException(msg);
+            newException = new ConsumerAssignException(msg);
         } else if (cause instanceof MessageAcknowledgeException) {
-            return new MessageAcknowledgeException(msg);
+            newException = new MessageAcknowledgeException(msg);
         } else if (cause instanceof TransactionConflictException) {
-            return new TransactionConflictException(msg);
+            newException = new TransactionConflictException(msg);
         } else if (cause instanceof TopicDoesNotExistException) {
-            return new TopicDoesNotExistException(msg);
+            newException = new TopicDoesNotExistException(msg);
         } else if (cause instanceof ProducerFencedException) {
-            return new ProducerFencedException(msg);
+            newException = new ProducerFencedException(msg);
         } else if (cause instanceof MemoryBufferIsFullError) {
-            return new MemoryBufferIsFullError(msg);
+            newException = new MemoryBufferIsFullError(msg);
         } else if (cause instanceof NotFoundException) {
-            return new NotFoundException(msg);
+            newException = new NotFoundException(msg);
         } else {
-            return new PulsarClientException(t);
+            newException = new PulsarClientException(t);
         }
+
+        Collection<Throwable> previousExceptions = getPreviousExceptions(t);
+        if (t != null) {

Review comment:
       This null check is redundant or should be put in the beginning of the method because we've already called `t`'s method before:
   
   ```java
           Throwable cause = t.getCause();
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jerrypeng commented on a change in pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
jerrypeng commented on a change in pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#discussion_r688081880



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java
##########
@@ -883,16 +887,21 @@ private void sendFlowPermitsToBroker(ClientCnx cnx, int numMessages) {
     public void connectionFailed(PulsarClientException exception) {
         boolean nonRetriableError = !PulsarClientException.isRetriableError(exception);
         boolean timeout = System.currentTimeMillis() > subscribeTimeout;
-        if ((nonRetriableError || timeout) && subscribeFuture.completeExceptionally(exception)) {
-            setState(State.Failed);
-            if (nonRetriableError) {
-                log.info("[{}] Consumer creation failed for consumer {} with unretriableError {}", topic, consumerId, exception);
-            } else {
-                log.info("[{}] Consumer creation failed for consumer {} after timeout", topic, consumerId);
+        if (nonRetriableError || timeout) {
+            exception.setPreviousExceptions(previousExceptions);
+            if (subscribeFuture.completeExceptionally(exception)) {
+                setState(State.Failed);
+                if (nonRetriableError) {
+                    log.info("[{}] Consumer creation failed for consumer {} with unretriableError {}", topic, consumerId, exception);
+                } else {
+                    log.info("[{}] Consumer creation failed for consumer {} after timeout", topic, consumerId);

Review comment:
       Can we add it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ivankelly commented on pull request #11627: PIP-91: Separate lookup timeout from operation timeout

Posted by GitBox <gi...@apache.org>.
ivankelly commented on pull request #11627:
URL: https://github.com/apache/pulsar/pull/11627#issuecomment-898332683


   @BewareMyPower do you have any additional concerns?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org