You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/07/27 19:45:16 UTC

[GitHub] [pulsar-client-go] zzzming opened a new pull request, #815: [issue 814] consumer and producer reconnect failure metrics counter

zzzming opened a new pull request, #815:
URL: https://github.com/apache/pulsar-client-go/pull/815

   Implement #814 
   
   ### Motivation
   In a Pulsar cluster's kubernetes deployment or a deployment with Proxy/LB in the front, we need metrics counter to track the re-connection failure producers and consumers.
   
   When brokers go offline but the proxy/LB is still functioning, TCP connection can still be established but the topic look up failed. pulsar_client_connections_establishment_errors counter is not incremented in this case.  Therefore new counters are required to track such failure cases.
   
   ### Modifications
   
   Two new counter metrics `pulsar_client_producers_reconnect_failure` and `pulsar_client_consumers_reconnect_failure` will be incremented at the producer_partition and consumer_partition retry failure code block.
   
   Because reconnecting to broker by producer/consumer creation has doubling back off retry, to reduce excessive retry failure noise, these two counters will only incremented by either of two conditions are met.
   1. the max backoff retry is reached. This is a three minute window
   2. Or MaxReconnectToBroker specified by the ProducerOption or ConsumerOption (user can define) is reached
   
   The existing code logic already covers the case when the topic does not exist. The counters will not be pegged if the topic does not exist. It simply exists from the retry loop at once.
   
   ### Verifying this change
   
   This has been verified in the Pulsar cluster deployment with Proxy. We do not have such set up in CI because it's not possible to test with Pulsar standalone mode.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): ( no)
     - The public API: ( no)
     - The schema: (no)
     - The default values of configurations: ( no)
     - The wire protocol: (no)
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable / docs / GoDocs / not documented)
     - If a feature is not applicable for documentation, explain why?
     - If a feature is not documented yet in this PR, please create a followup issue for adding the documentation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-go] zzzming commented on a diff in pull request #815: [issue 814] consumer and producer reconnect failure metrics counter

Posted by GitBox <gi...@apache.org>.
zzzming commented on code in PR #815:
URL: https://github.com/apache/pulsar-client-go/pull/815#discussion_r931614381


##########
pulsar/consumer_partition.go:
##########
@@ -1155,6 +1155,9 @@ func (pc *partitionConsumer) reconnectToBroker() {
 		if maxRetry > 0 {
 			maxRetry--
 		}
+		if maxRetry == 0 || backoff.IsMaxBackoffReached() {
+			pc.metrics.ConsumersReconnectFailure.Inc()

Review Comment:
   I updated based on your comments. The counter will be incremented in every failure.



##########
pulsar/consumer_partition.go:
##########
@@ -1155,6 +1155,9 @@ func (pc *partitionConsumer) reconnectToBroker() {
 		if maxRetry > 0 {
 			maxRetry--
 		}
+		if maxRetry == 0 || backoff.IsMaxBackoffReached() {
+			pc.metrics.ConsumersReconnectFailure.Inc()

Review Comment:
   @michaeljmarshall ^



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-go] michaeljmarshall commented on a diff in pull request #815: [issue 814] consumer and producer reconnect failure metrics counter

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on code in PR #815:
URL: https://github.com/apache/pulsar-client-go/pull/815#discussion_r931568426


##########
pulsar/producer_partition.go:
##########
@@ -419,6 +419,9 @@ func (p *partitionProducer) reconnectToBroker() {
 		if maxRetry > 0 {
 			maxRetry--
 		}
+		if maxRetry == 0 || backoff.IsMaxBackoffReached() {
+			p.metrics.ProducersReconnectFailure.Inc()

Review Comment:
   Same comment here, but for the producer case.



##########
pulsar/consumer_partition.go:
##########
@@ -1155,6 +1155,9 @@ func (pc *partitionConsumer) reconnectToBroker() {
 		if maxRetry > 0 {
 			maxRetry--
 		}
+		if maxRetry == 0 || backoff.IsMaxBackoffReached() {
+			pc.metrics.ConsumersReconnectFailure.Inc()

Review Comment:
   This will only increment when the max retries have been exhausted, which won't occur when unlimited retries are enabled. I think it'd be reasonable to increment this metric for each failed attempt to reconnect to the broker. 
   
   > Because reconnecting to broker by producer/consumer creation has doubling back off retry, to reduce excessive retry failure noise, these two counters will only incremented by either of two conditions are met.
   
   Is there a reason you view regular (non final) failure as noise? Given that we're willing to log that the consumer failed to connect, I think it's reasonable to increment the metric.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-go] michaeljmarshall merged pull request #815: [issue 814] consumer and producer reconnect failure metrics counter

Posted by GitBox <gi...@apache.org>.
michaeljmarshall merged PR #815:
URL: https://github.com/apache/pulsar-client-go/pull/815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar-client-go] zzzming commented on a diff in pull request #815: [issue 814] consumer and producer reconnect failure metrics counter

Posted by GitBox <gi...@apache.org>.
zzzming commented on code in PR #815:
URL: https://github.com/apache/pulsar-client-go/pull/815#discussion_r931614381


##########
pulsar/consumer_partition.go:
##########
@@ -1155,6 +1155,9 @@ func (pc *partitionConsumer) reconnectToBroker() {
 		if maxRetry > 0 {
 			maxRetry--
 		}
+		if maxRetry == 0 || backoff.IsMaxBackoffReached() {
+			pc.metrics.ConsumersReconnectFailure.Inc()

Review Comment:
   It will also be triggered when the max backoff is reached. There is a default max. So it will reached in a minute range.
   
   What if there is temporary network failure, it recovers. Do you think if it is up to someone to consume the metrics to decide the course of action? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org