You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/05/20 09:58:51 UTC

[GitHub] [pulsar] lhotari opened a new issue #10654: [Performance] Prometheus client causes allocation overhead by creating exceptions

lhotari opened a new issue #10654:
URL: https://github.com/apache/pulsar/issues/10654


   **Is your enhancement request related to a problem? Please describe.**
   
   The Prometheus https://github.com/prometheus/client_java/blob/master/simpleclient_hotspot/src/main/java/io/prometheus/client/hotspot/StandardExports.java class is creating exceptions as part of ordinary execution path. This causes an additional performance overhead.
   
   example in exceptions flamegraph:
   ![image](https://user-images.githubusercontent.com/66864/118958813-6d847000-b96a-11eb-881f-c8e7e71df0f6.png)
   
   it also looks like the same metrics are gathered multiple times:
   
   ![image](https://user-images.githubusercontent.com/66864/118959504-0fa45800-b96b-11eb-9e7c-b1b1e9d0883e.png)
   
   **Describe the solution you'd like**
   
   Replace StandardExports with a solution that doesn't throw exceptions. Possible contribute the solution to Prometheus client_java project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #10654: [Performance] DimensionStats implementation uses inefficient Prometheus client method CollectorRegistry.getSampleValue which is only intended for use in unit tests

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #10654:
URL: https://github.com/apache/pulsar/issues/10654#issuecomment-845737326


   @rdhabalia Do you have suggestions for fixing this performance issue in broker's DimensionStats implementation? The javadoc of CollectorRegistry.getSampleValue method states: ["This is inefficient, and intended only for use in unittests.".](https://github.com/prometheus/client_java/blob/8fcc7bc03dfbabd0c15776640d5fe52942aa9309/simpleclient/src/main/java/io/prometheus/client/CollectorRegistry.java#L261)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on issue #10654: [Performance] DimensionStats implementation uses inefficient Prometheus client method CollectorRegistry.getSampleValue which is only intended for use in unit tests

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on issue #10654:
URL: https://github.com/apache/pulsar/issues/10654#issuecomment-845737326


   @rdhabalia Do you have suggestions for fixing this performance issue in broker's DimensionStats implementation? The javadoc of CollectorRegistry.getSampleValue method states: ["This is inefficient, and intended only for use in unittests.".](https://github.com/prometheus/client_java/blob/8fcc7bc03dfbabd0c15776640d5fe52942aa9309/simpleclient/src/main/java/io/prometheus/client/CollectorRegistry.java#L261) . The [usage of the CollectorRegistry.getSampleMethod method in DimensionStats implementation](https://github.com/apache/pulsar/blob/102fa9de03509b86e47f58ab8e1c0dde2095da3b/pulsar-broker/src/main/java/org/apache/pulsar/broker/stats/DimensionStats.java#L107-L108) should be replaced somehow. Perhaps this causes a larger refactoring?
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rdhabalia commented on issue #10654: [Performance] DimensionStats implementation uses inefficient Prometheus client method CollectorRegistry.getSampleValue which is only intended for use in unit tests

Posted by GitBox <gi...@apache.org>.
rdhabalia commented on issue #10654:
URL: https://github.com/apache/pulsar/issues/10654#issuecomment-846298941


   Yes, `CollectorRegistry::getSampleValue()` can be inefficient and we can improve it but just wanted to understand performance impact. It is used to generate Prometheus stats and that is called at every 60 secs by dedicated [Pulsar-stats thread](https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/PulsarStats.java#L183) so, it should not impact client flow. so, can you please confirm if you are seeing any performance impact in broker due to this usage?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on issue #10654: [Performance] DimensionStats implementation uses inefficient Prometheus client method CollectorRegistry.getSampleValue which is only intended for use in unit tests

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on issue #10654:
URL: https://github.com/apache/pulsar/issues/10654#issuecomment-847566563


   > Yes, `CollectorRegistry::getSampleValue()` can be inefficient and we can improve it but just wanted to understand performance impact. It is used to generate Prometheus stats and that is called at every 60 secs by dedicated [Pulsar-stats thread](https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/PulsarStats.java#L183) so, it should not impact client flow. so, can you please confirm if you are seeing any performance impact in broker due to this usage?
   
   @rdhabalia I haven't measured the performance impact for an end-to-end scenario. This problem shows up when profling exceptions. I don't think this is a major performance issue for Pulsar users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli closed issue #10654: [Performance] DimensionStats implementation uses inefficient Prometheus client method CollectorRegistry.getSampleValue which is only intended for use in unit tests

Posted by GitBox <gi...@apache.org>.
eolivelli closed issue #10654:
URL: https://github.com/apache/pulsar/issues/10654


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #10654: [Performance] Prometheus client causes allocation overhead by creating exceptions

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #10654:
URL: https://github.com/apache/pulsar/issues/10654#issuecomment-845732499


   There seems to be a design issue in how org.apache.pulsar.broker.stats.DimensionStats is implemented. What happens it that for each call to the getDimensionX methods, all samples are collected for all Prometheus client collectors and filtered. 
   
   https://github.com/prometheus/client_java/blob/8fcc7bc03dfbabd0c15776640d5fe52942aa9309/simpleclient/src/main/java/io/prometheus/client/CollectorRegistry.java#L258-L274
   
   The javadoc of getSampleValue method states: "This is inefficient, and intended only for use in unittests.".
   
   This is amplifying the performance issue in the Prometheus client's hotspot.StandardCollector implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on issue #10654: [Performance] Prometheus client causes allocation overhead by creating exceptions

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on issue #10654:
URL: https://github.com/apache/pulsar/issues/10654#issuecomment-845732499


   There seems to be a design issue in how org.apache.pulsar.broker.stats.DimensionStats is implemented. What happens it that for each call to the getDimensionX methods, all samples are collected for all Prometheus client collectors and filtered. 
   
   https://github.com/prometheus/client_java/blob/8fcc7bc03dfbabd0c15776640d5fe52942aa9309/simpleclient/src/main/java/io/prometheus/client/CollectorRegistry.java#L258-L274
   
   The javadoc of CollectorRegistry.getSampleValue method states: ["This is inefficient, and intended only for use in unittests.".](https://github.com/prometheus/client_java/blob/8fcc7bc03dfbabd0c15776640d5fe52942aa9309/simpleclient/src/main/java/io/prometheus/client/CollectorRegistry.java#L261)
   
   This is amplifying the performance issue in the Prometheus client's hotspot.StandardCollector implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #10654: [Performance] DimensionStats implementation uses inefficient Prometheus client method CollectorRegistry.getSampleValue which is only intended for unit tests

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #10654:
URL: https://github.com/apache/pulsar/issues/10654#issuecomment-845733449


   @addisonj Would you also like to take this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #10654: [Performance] DimensionStats implementation uses inefficient Prometheus client method CollectorRegistry.getSampleValue which is only intended for use in unit tests

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #10654:
URL: https://github.com/apache/pulsar/issues/10654#issuecomment-847566563


   > Yes, `CollectorRegistry::getSampleValue()` can be inefficient and we can improve it but just wanted to understand performance impact. It is used to generate Prometheus stats and that is called at every 60 secs by dedicated [Pulsar-stats thread](https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/PulsarStats.java#L183) so, it should not impact client flow. so, can you please confirm if you are seeing any performance impact in broker due to this usage?
   
   I haven't measured the performance impact for an end-to-end scenario. This problem shows up when profling exceptions. I don't think this is a major performance issue for Pulsar users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org