You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/12/17 10:09:50 UTC

[GitHub] [pulsar] eolivelli opened a new pull request #13380: Issue 13379: Metrics: noise stacktrace if Prometheus closes the connection due to a timeout

eolivelli opened a new pull request #13380:
URL: https://github.com/apache/pulsar/pull/13380


   Fixes #13379 
   
   ### Motivation
   
   When there is a timeout in serving the metrics Pulsar will print a long stacktrace that is scary and meaningless to the users.
   
   ### Modifications
   
   1. Provide a better log message: no stack trace, include the processing time
   2. ensure that context.complete() is always executed, in a finally block
   
   ### Verifying this change
   This change is a trivial rework / code cleanup without any test coverage.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on a change in pull request #13380: Issue 13379: Metrics: noise stacktrace if Prometheus closes the connection due to a timeout

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #13380:
URL: https://github.com/apache/pulsar/pull/13380#discussion_r771309644



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsServlet.java
##########
@@ -70,18 +71,27 @@ protected void doGet(HttpServletRequest request, HttpServletResponse response)
         AsyncContext context = request.startAsync();
         context.setTimeout(metricsServletTimeoutMs);
         executor.execute(safeRun(() -> {
+            long start = System.currentTimeMillis();
             HttpServletResponse res = (HttpServletResponse) context.getResponse();
             try {
                 res.setStatus(HttpStatus.OK_200);
                 res.setContentType("text/plain");
                 PrometheusMetricsGenerator.generate(pulsar, shouldExportTopicMetrics, shouldExportConsumerMetrics,
                         shouldExportProducerMetrics, splitTopicAndPartitionLabel, res.getOutputStream(),
                         metricsProviders);
-                context.complete();
-
             } catch (Exception e) {
-                log.error("Failed to generate prometheus stats", e);
+                long end = System.currentTimeMillis();
+                long time = end - start;

Review comment:
       I don't like very much Guava Stopwatch (in the past the broke compatibility), and in general I believe that we should keep off Guava as much as possible (still for the fact that they break the APIs quite often)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] dlg99 commented on a change in pull request #13380: Issue 13379: Metrics: noise stacktrace if Prometheus closes the connection due to a timeout

Posted by GitBox <gi...@apache.org>.
dlg99 commented on a change in pull request #13380:
URL: https://github.com/apache/pulsar/pull/13380#discussion_r771510978



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsServlet.java
##########
@@ -70,19 +71,38 @@ protected void doGet(HttpServletRequest request, HttpServletResponse response)
         AsyncContext context = request.startAsync();
         context.setTimeout(metricsServletTimeoutMs);
         executor.execute(safeRun(() -> {
+            long start = System.currentTimeMillis();
             HttpServletResponse res = (HttpServletResponse) context.getResponse();
             try {
                 res.setStatus(HttpStatus.OK_200);
                 res.setContentType("text/plain");
                 PrometheusMetricsGenerator.generate(pulsar, shouldExportTopicMetrics, shouldExportConsumerMetrics,
                         shouldExportProducerMetrics, splitTopicAndPartitionLabel, res.getOutputStream(),
                         metricsProviders);
-                context.complete();
-
             } catch (Exception e) {
-                log.error("Failed to generate prometheus stats", e);

Review comment:
       maybe leave it at debug level?
   In case someone has to investigate missing metrics/problems with prometheus.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on a change in pull request #13380: Issue 13379: Metrics: noise stacktrace if Prometheus closes the connection due to a timeout

Posted by GitBox <gi...@apache.org>.
lhotari commented on a change in pull request #13380:
URL: https://github.com/apache/pulsar/pull/13380#discussion_r771274117



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsServlet.java
##########
@@ -70,18 +71,27 @@ protected void doGet(HttpServletRequest request, HttpServletResponse response)
         AsyncContext context = request.startAsync();
         context.setTimeout(metricsServletTimeoutMs);
         executor.execute(safeRun(() -> {
+            long start = System.currentTimeMillis();
             HttpServletResponse res = (HttpServletResponse) context.getResponse();
             try {
                 res.setStatus(HttpStatus.OK_200);
                 res.setContentType("text/plain");
                 PrometheusMetricsGenerator.generate(pulsar, shouldExportTopicMetrics, shouldExportConsumerMetrics,
                         shouldExportProducerMetrics, splitTopicAndPartitionLabel, res.getOutputStream(),
                         metricsProviders);
-                context.complete();
-
             } catch (Exception e) {
-                log.error("Failed to generate prometheus stats", e);
+                long end = System.currentTimeMillis();
+                long time = end - start;

Review comment:
       I wonder if we should use Guava's Stopwatch for such use cases? Javadoc: https://guava.dev/releases/snapshot-jre/api/docs/com/google/common/base/Stopwatch.html




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli merged pull request #13380: Issue 13379: Metrics: noise stacktrace if Prometheus closes the connection due to a timeout

Posted by GitBox <gi...@apache.org>.
eolivelli merged pull request #13380:
URL: https://github.com/apache/pulsar/pull/13380


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on a change in pull request #13380: Issue 13379: Metrics: noise stacktrace if Prometheus closes the connection due to a timeout

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #13380:
URL: https://github.com/apache/pulsar/pull/13380#discussion_r772127979



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsServlet.java
##########
@@ -70,19 +71,38 @@ protected void doGet(HttpServletRequest request, HttpServletResponse response)
         AsyncContext context = request.startAsync();
         context.setTimeout(metricsServletTimeoutMs);
         executor.execute(safeRun(() -> {
+            long start = System.currentTimeMillis();
             HttpServletResponse res = (HttpServletResponse) context.getResponse();
             try {
                 res.setStatus(HttpStatus.OK_200);
                 res.setContentType("text/plain");
                 PrometheusMetricsGenerator.generate(pulsar, shouldExportTopicMetrics, shouldExportConsumerMetrics,
                         shouldExportProducerMetrics, splitTopicAndPartitionLabel, res.getOutputStream(),
                         metricsProviders);
-                context.complete();
-
             } catch (Exception e) {
-                log.error("Failed to generate prometheus stats", e);

Review comment:
       is it a really bad error and we must see this in production (and we also need the full stacktrace)
   
   if we are not able to generate metrics (that should be a very light weight process) these is some action to take, if we leave DEBUG nobody will probably notice   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org