You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/02/24 11:58:49 UTC

[GitHub] [pulsar] tjiuming opened a new pull request #14453: Dev/prometheus performance

tjiuming opened a new pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453


   ### Motivation
   1. Avoid more than once `PrometheusMetricsGenerator#generate` invoke in a period(1 minute), for the purpose of saving memory and reducing CPU usage.
   2. Persist metrics data into file and send to client by jetty's `send_file`. Avoid huge heap memory allocations, too much `mem_copy` operations, too much memory resizes, high GC pressure and high CPU usage.
   
   ### Modifications
   1. Using sliding window in `PrometheusMetricsGenerator#generate`, invoke `PrometheusMetricsGenerator#generate` once in a period, cache the result and return to client directly.
   2. Persist metrics data into temp file, send it to client by `send_file`.
   3. Delete temp files when JVM exits.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (no)
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
   Check the box below or label this PR directly (if you have committer privilege).
   
   Need to update docs? 
   
   - [ ] `doc-required` 
     
     (If you need help on updating docs, create a doc issue)
     
   - [x] `no-need-doc` 
     
     (Please explain why)
     
   - [ ] `doc` 
     
     (If this PR contains doc changes)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Jason918 commented on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
Jason918 commented on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1053432187


   > When pulsar has to export more than 50MB metrics data, there will be cause high heap memory usage, high GC pressure and high CPU usage. The pr is for the purpose of fix it.
   
   Actually, I am just curious if there is some situation that it's better not use this feature, maybe like metric data is under 50MB? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming edited a comment on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming edited a comment on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1052256712


   ![8172CEA8-6E11-4F1E-A32B-EC4D23687220](https://user-images.githubusercontent.com/95597048/155850331-6432231b-5dea-4721-9f96-0c3375a78519.png)
   ![DEB1D0CE-A277-4285-91BB-E69475FB3EEC](https://user-images.githubusercontent.com/95597048/155850338-bc9e3a71-5947-4c06-bc8f-f01e01e89000.png)
   
   After and before(write out 200MB metrics data).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming edited a comment on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming edited a comment on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1072954361


   Make cache-metrics-data configurable
   @Jason918 @eolivelli   Could you please help review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming commented on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming commented on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1072954361


   Make cache-metrics-data configurable
   @Jason918 @Jason918  Could you please help review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming commented on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming commented on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1049787632


   Tests to be completed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming commented on a change in pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming commented on a change in pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#discussion_r832331106



##########
File path: pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
##########
@@ -2276,6 +2276,11 @@
         doc = "If true, export topic level metrics otherwise namespace level"
     )
     private boolean exposeTopicLevelMetricsInPrometheus = true;
+    @FieldContext(
+            category = CATEGORY_METRICS,
+            doc = "If true, export buffered broker metrics"
+    )
+    private boolean exposeBufferedBrokerMetrics = false;

Review comment:
       Thanks for your review. Resolved




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming edited a comment on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming edited a comment on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1053079112


   > Do you actual use file IOs? If so, please help point it out.
   
   Canceled persist data to temp files, maybe broker has a poor performance disk, so use `CompositeDirectByteBuf` only.
   
   > Generally, it seems this is a nice optimization, but do we have any downside of this feature?
   
   When pulsar has to export more than 50MB metrics data, there will be cause high heap memory usage, high GC pressure and high CPU usage. The pr is for the purpose of fix it.
   
   
   @Jason918 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming commented on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming commented on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1053459602


   > Nice work. We should make this configurable. In case we have a bug and we want to disable this feature
   
   Thanks. 
   
   `Avoid more than once PrometheusMetricsGenerator#generate invoke in a period(1 minute), for the purpose of saving memory and reducing CPU usage.` 
   
   This feature is a precursor. What I want is in the future we can clean all the metrics objects after metrics data generated, it can be release a lot of objects and reduce heap memory usage if there are over 1 million topics in a broker.  
   Many metrics exported as `Gauges`, so we don't have to depend on `rate/irate` functions to aggregate these metrics. It can be reduce prometheus's pressure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming commented on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming commented on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1053446892


   > Actually, I am just curious if there is some situation that it's better not use this feature, maybe like metric data is under 50MB?
   
   `Avoid more than once PrometheusMetricsGenerator#generate invoke in a period(1 minute), for the purpose of saving memory and reducing CPU usage.`  This can be configurable, I'll add a configuration if needed. 
   
   `Canceled persist data to temp files, maybe broker has a poor performance disk, so use CompositeDirectByteBuf only.` But this, we cannot know how much metrics data will be wrote to clients before it generated. So, it cannot be disabled.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on a change in pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#discussion_r832308809



##########
File path: pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java
##########
@@ -2276,6 +2276,11 @@
         doc = "If true, export topic level metrics otherwise namespace level"
     )
     private boolean exposeTopicLevelMetricsInPrometheus = true;
+    @FieldContext(
+            category = CATEGORY_METRICS,
+            doc = "If true, export buffered broker metrics"
+    )
+    private boolean exposeBufferedBrokerMetrics = false;

Review comment:
       what about "bufferMetrics" ?
   
   this change also will apply to Proxy and Function Worker, so I would drop the word "broker" 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming commented on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming commented on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1052256712


   ![8172CEA8-6E11-4F1E-A32B-EC4D23687220](https://user-images.githubusercontent.com/95597048/155850331-6432231b-5dea-4721-9f96-0c3375a78519.png)
   ![DEB1D0CE-A277-4285-91BB-E69475FB3EEC](https://user-images.githubusercontent.com/95597048/155850338-bc9e3a71-5947-4c06-bc8f-f01e01e89000.png)
   
   After and before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming edited a comment on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming edited a comment on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1053446892


   > Actually, I am just curious if there is some situation that it's better not use this feature, maybe like metric data is under 50MB?
   
   `Avoid more than once PrometheusMetricsGenerator#generate invoke in a period(1 minute), for the purpose of saving memory and reducing CPU usage.`  This can be configurable, I'll add a configuration if needed. 
   
   `Canceled persist data to temp files, maybe broker has a poor performance disk, so use CompositeDirectByteBuf only.` But this, we cannot know how much metrics data will be wrote to clients before it generated. So, it cannot be disabled.
   
   @Jason918 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming edited a comment on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming edited a comment on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1053459602


   > Nice work. We should make this configurable. In case we have a bug and we want to disable this feature
   
   Thanks. 
   
   `Avoid more than once PrometheusMetricsGenerator#generate invoke in a period(1 minute), for the purpose of saving memory and reducing CPU usage.` 
   
   This feature is a precursor. What I want is in the future we can clean all the metrics objects after metrics data generated, it could release a lot of objects and reduce heap memory usage if there are over 1 million topics in a broker.  
   Many metrics exported as `Gauges`, so we don't have to depend on `rate/irate` functions to aggregate these metrics. It can be reduce prometheus's pressure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] tjiuming commented on pull request #14453: Improve /metrics endpoint performance

Posted by GitBox <gi...@apache.org>.
tjiuming commented on pull request #14453:
URL: https://github.com/apache/pulsar/pull/14453#issuecomment-1053079112


   > Do you actual use file IOs? If so, please help point it out.
   Canceled persist data to temp files, maybe broker has a poor performance disk, so use `CompositeDirectByteBuf` only.
   
   > Generally, it seems this is a nice optimization, but do we have any downside of this feature?
   When pulsar has to export more than 50MB metrics data, there will be cause high heap memory usage, high GC pressure and high CPU usage. The pr is for the purpose of fix it.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org