You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/13 13:27:27 UTC

[GitHub] [pulsar] marksilcox opened a new pull request, #17618: [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

marksilcox opened a new pull request, #17618:
URL: https://github.com/apache/pulsar/pull/17618

   Fixes #8407
   Fixes #13865
   
   ### Motivation
   
   Current broker prometheus metrics are not grouped by metric type which causes issues in systems that read these metrics (e.g. DataDog).
   
   Prometheus docs states "All lines for a given metric must be provided as one single group" - https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#grouping-and-sorting
   
   Modifications
   Updated the namespace and topic prometheus metric generators to group the metrics under the appropriate type header.
   Updated function worker stats to include TYPE headers
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   This change added tests and can be verified as follows:
   
   Added unit test to verify all metrics are grouped under correct type header
   
   ### Does this pull request potentially affect one of the following parts:
   
   - Dependencies (does it add or upgrade a dependency): (no)
   - The public API: (no)
   - The schema: (no)
   - The default values of configurations: (no)
   - The wire protocol: (no)
   - The rest endpoints: (no)
   - The admin cli options: (no)
   - Anything that affects deployment: (no)
   
   ### Documentation
   
   Need to update docs? 
   
   - [X] `doc-not-needed` 
   Changes to match prometheus spec
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] marksilcox commented on pull request #17618: [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

Posted by GitBox <gi...@apache.org>.
marksilcox commented on PR #17618:
URL: https://github.com/apache/pulsar/pull/17618#issuecomment-1246340653

   /pulsarbot run-failure-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] marksilcox commented on pull request #17618: [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

Posted by GitBox <gi...@apache.org>.
marksilcox commented on PR #17618:
URL: https://github.com/apache/pulsar/pull/17618#issuecomment-1249471841

   /pulsarbot run-failure-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] asafm commented on pull request #17618: [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

Posted by GitBox <gi...@apache.org>.
asafm commented on PR #17618:
URL: https://github.com/apache/pulsar/pull/17618#issuecomment-1262624649

   @codelipenghui The implementation between 2.9 and master are indeed different, hence the reason why the memory leak was introduced only in 2.9 - I checked it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui merged pull request #17618: [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

Posted by GitBox <gi...@apache.org>.
codelipenghui merged PR #17618:
URL: https://github.com/apache/pulsar/pull/17618


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #17618: [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #17618:
URL: https://github.com/apache/pulsar/pull/17618#issuecomment-1257333750

   I can't reproduce the issue on the master branch with PR (https://github.com/apache/pulsar/pull/15558); it looks like the memory leak issue only happened on branch-2.9


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #17618: [fix][broker][functions-worker] Ensure prometheus metrics are grouped by type (#8407, #13865)

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #17618:
URL: https://github.com/apache/pulsar/pull/17618#issuecomment-1256848175

   @marksilcox Sorry, I have to revert this PR since it introduced a memory leak that was detected by a long-running continuous verification test. 
   
   The reproduction steps:
   
   1. Start a standalone
   2. Create a partitioned topic (100 partitions)
   3. Start producer `bin/pulsar-perf produce test -r 100 -bm 1 -mk random`
   4. Start consumer `bin/pulsar-perf consume -st Key_Shared -r 100 -n 50 test`
   5. Access the metrics endpoint
   
   ```shell
   for i in {1..1000}
   do
   curl -L localhost:8080/metrics/
   done
   ```
   
   <img width="748" alt="image" src="https://user-images.githubusercontent.com/12592133/192078282-bf3e1b50-9f54-485b-a2dc-283f9d6966ff.png">
   
   After reverting this PR, the memory leak issue has been fixed.
   
   <img width="822" alt="image" src="https://user-images.githubusercontent.com/12592133/192078735-9d023a85-b822-4a80-aa76-7f94360e926f.png">
   
   I'm not sure where is the root cause yet. Since many people are built based on the Pulsar release branches, so we'd better revert first and then fix the memory issue and create a new PR again. @marksilcox If you need help with the memory leak issue, I believe @tjiuming or @asafm can provide some insight here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org