You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/27 11:15:48 UTC

[GitHub] [pulsar] tjiuming opened a new pull request, #17852: [branch-2.9] Group prometheus metrics.

tjiuming opened a new pull request, #17852:
URL: https://github.com/apache/pulsar/pull/17852

   ### Motivation
   
   This PR is fully based on https://github.com/apache/pulsar/pull/17618, but the origin PR has a memory leak issue, so it was reverted.
   This PR is supposed to fix the memory leak issue.
   
   Why the origin PR has a memory leak issue:
   In https://github.com/apache/pulsar/pull/17618, PrometheusMetricsGenerator.java:
   ```
       public static void generate(PulsarService pulsar, boolean includeTopicMetrics, boolean includeConsumerMetrics,
                                   boolean includeProducerMetrics, boolean splitTopicAndPartitionIndexLabel,
                                   OutputStream out,
                                   List<PrometheusRawMetricsProvider> metricsProviders)
               throws IOException {
           ByteBuf buf = ByteBufAllocator.DEFAULT.heapBuffer();
           boolean exceptionHappens = false;
           //Used in namespace/topic and transaction aggregators as share metric names
           PrometheusMetricStreams metricStreams = new PrometheusMetricStreams();
           try {
               SimpleTextOutputStream stream = new SimpleTextOutputStream(buf);
   
               generateSystemMetrics(stream, pulsar.getConfiguration().getClusterName());
   
               NamespaceStatsAggregator.generate(pulsar, includeTopicMetrics, includeConsumerMetrics,
                       includeProducerMetrics, splitTopicAndPartitionIndexLabel, metricStreams);
   
               if (pulsar.getWorkerServiceOpt().isPresent()) {
                   pulsar.getWorkerService().generateFunctionsStats(stream);
               }
   
               if (pulsar.getConfiguration().isTransactionCoordinatorEnabled()) {
                   TransactionAggregator.generate(pulsar, metricStreams, includeTopicMetrics);
               }
   
               metricStreams.flushAllToStream(stream);
   
               generateBrokerBasicMetrics(pulsar, stream);
   
               generateManagedLedgerBookieClientMetrics(pulsar, stream);
   
               if (metricsProviders != null) {
                   for (PrometheusRawMetricsProvider metricsProvider : metricsProviders) {
                       metricsProvider.generate(stream);
                   }
               }
               out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
           } finally {
               //release all the metrics buffers
               metricStreams.releaseAll();
               //if exception happens, release buffer
               if (exceptionHappens) {
                   buf.release();
               }
           }
       }
   ```
   in the finally scope, call `buf.release()` when `exceptionHappens == true`. But the initialize value of `exceptionHappens` is false, and it never updated to true anywhere. So the memory leak issue happens.
   It should be a small mistake in the check pick process.
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
   
   - [x] `doc-not-needed` 
   (Please explain why)
   
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-complete`
   (Docs have been already added)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on a diff in pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on code in PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#discussion_r982964820


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java:
##########
@@ -129,6 +135,8 @@ public static void generate(PulsarService pulsar, boolean includeTopicMetrics, b
             }
             out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
         } finally {
+            //release all the metrics buffers
+            metricStreams.releaseAll();

Review Comment:
   @tjiuming Oh, I see. They are different buffers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#discussion_r982324246


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java:
##########
@@ -129,6 +135,8 @@ public static void generate(PulsarService pulsar, boolean includeTopicMetrics, b
             }
             out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
         } finally {
+            //release all the metrics buffers
+            metricStreams.releaseAll();

Review Comment:
   I'm not entirely sure if this is the origin of the bug:
   
   We allocate the byte buffer, use it for the `SimpleTextOutputStream`, and then once we finish writing to it, we return the byte buffer. We can't use `finally` as is to release the byte buffer. Only when there is an exception, need to revert the allocation of this byte buffer.
   The original code is correct.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] marksilcox commented on pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
marksilcox commented on PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#issuecomment-1259571060

   @tjiuming thanks for fixing this, struggling to find time at the moment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui merged pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
codelipenghui merged PR #17852:
URL: https://github.com/apache/pulsar/pull/17852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on a diff in pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on code in PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#discussion_r981921351


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java:
##########
@@ -129,6 +135,8 @@ public static void generate(PulsarService pulsar, boolean includeTopicMetrics, b
             }
             out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
         } finally {
+            //release all the metrics buffers
+            metricStreams.releaseAll();

Review Comment:
   @tjiuming will the `buf` can get a chance to be released multiple times? Since the `metricStreams.releaseAll()` will also get a chance to release the `buf`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tjiuming commented on a diff in pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
tjiuming commented on code in PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#discussion_r981937201


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java:
##########
@@ -129,6 +135,8 @@ public static void generate(PulsarService pulsar, boolean includeTopicMetrics, b
             }
             out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
         } finally {
+            //release all the metrics buffers
+            metricStreams.releaseAll();

Review Comment:
   @codelipenghui `buf` and `metricStreams` only released in the `finally` scope, and `metricStreams.releaseAll()` will not release `buf`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tjiuming commented on a diff in pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
tjiuming commented on code in PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#discussion_r981874488


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java:
##########
@@ -129,6 +135,8 @@ public static void generate(PulsarService pulsar, boolean includeTopicMetrics, b
             }
             out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
         } finally {
+            //release all the metrics buffers
+            metricStreams.releaseAll();

Review Comment:
   @codelipenghui No, this check is for https://github.com/apache/pulsar/pull/14453, the feature didn't check-pick to 2.9



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on a diff in pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on code in PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#discussion_r981854169


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java:
##########
@@ -129,6 +135,8 @@ public static void generate(PulsarService pulsar, boolean includeTopicMetrics, b
             }
             out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
         } finally {
+            //release all the metrics buffers
+            metricStreams.releaseAll();

Review Comment:
   @tjiuming I noticed you had removed the `exceptionHappens`, if we don't have the check, will we release the `buf` multiple times?
   
   From the master branch:
   
   ```
   } catch (Throwable t) {
               exceptionHappens = true;
               throw t;
           } finally {
               //release all the metrics buffers
               metricStreams.releaseAll();
               //if exception happens, release buffer
               if (exceptionHappens) {
                   buf.release();
               }
           }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on a diff in pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on code in PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#discussion_r981854169


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java:
##########
@@ -129,6 +135,8 @@ public static void generate(PulsarService pulsar, boolean includeTopicMetrics, b
             }
             out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
         } finally {
+            //release all the metrics buffers
+            metricStreams.releaseAll();

Review Comment:
   @tjiuming I noticed you had removed the `exceptionHappens`, if we don't have the check, will we release the `buf` multiple times?
   
   From the master branch:
   
   ```
   } catch (Throwable t) {
               exceptionHappens = true;
               throw t;
           } finally {
               //release all the metrics buffers
               metricStreams.releaseAll();
               //if exception happens, release buffer
               if (exceptionHappens) {
                   buf.release();
               }
           }
   ```
   
   And it looks like we will also cherry-pick https://github.com/apache/pulsar/pull/14453 to branch-2.9?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on a diff in pull request #17852: [branch-2.9] Group prometheus metrics.

Posted by GitBox <gi...@apache.org>.
asafm commented on code in PR #17852:
URL: https://github.com/apache/pulsar/pull/17852#discussion_r982324246


##########
pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/prometheus/PrometheusMetricsGenerator.java:
##########
@@ -129,6 +135,8 @@ public static void generate(PulsarService pulsar, boolean includeTopicMetrics, b
             }
             out.write(buf.array(), buf.arrayOffset(), buf.readableBytes());
         } finally {
+            //release all the metrics buffers
+            metricStreams.releaseAll();

Review Comment:
   Sorry for missing out on this bug. I wasn't aware that `PrometheusMetricsGenerator` doesn't look the same in master and in apache-2.9, specifically the generate0, and the performance improvement was backported, so it confused me as well.
   
   The fix looks solid, as this buffer is not returned as it did in `generate0` so it should always be released.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org