You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/08/14 14:27:36 UTC

[GitHub] [pulsar] 3286360470 opened a new issue, #17090: PIP-XYZ: Support latency quantile metric for pulsar read and write

3286360470 opened a new issue, #17090:
URL: https://github.com/apache/pulsar/issues/17090

   ### Motivation
   
   1. Currently, pulsar does not have an quantile metric used to statistics read and write latency, which is essential for our future troubleshooting and performance comparison tests.
   2. And we need network io metrics to determine the current network processing pressure, to prevent too many requests from hanging the broker.
   
   
   
   
   ### Goal
   
   1. Our goal is to statics the latency which begin at readHandler init and end at the messages are read from the cache or underlying storage(eg: bookkeeper / hdfs / ...).
   Why we just statics the latency between readHandler init and messages are found:
   1.1. Because when fetch request arrive the broker, It may be no new messages for Consumer, the broker will wait and periodically query whether there are new messages, so it may cause the read cache latency longer than the remote read.
   1.2. We statics the read latency between the cache/bookeeper/hdfs..., it is enough for us to troubleshooting and performance comparison tests.
   
   2. We just statics the idle network process number and its percnetile of the total number of io threads.
   
   
   ### API Changes
   
   1. Add OpStats get method to statics the write latency:
   ```
   public OpStatsLogger getOpStat();
   ```
   2. Add generate metrics method for netty thread pool usage
   ```
   private static void generateNetworkIdleMetrics(PulsarService pulsar, SimpleTextOutputStream stream);
   ```
   
   ### Implementation
   
   1. Add OpStats get method to statics the write latency:
   ```
   public OpStatsLogger getOpStat() {
           return PulsarService.statsProvider.getStatsLogger("").getOpStatsLogger("READ_ENTRY_LATENCY");
   }
   ```
   
   2. Add generate metrics method for netty thread pool usage:
   ```
   private static void generateNetworkIdleMetrics(PulsarService pulsar, SimpleTextOutputStream stream) {
           // generate network idle percent metrics
           try {
               int busyExecutors = 0;
               EventLoopGroup workerGroup = pulsar.getBrokerService().executor();
               Iterator<EventExecutor> iterator = workerGroup.iterator();
               while (iterator.hasNext()) {
                   SingleThreadEventExecutor next = (SingleThreadEventExecutor) iterator.next();
                   if (next.pendingTasks() > 0) {
                       ++busyExecutors;
                   }
               }
               int numIoThreads = pulsar.getConfiguration().getNumIOThreads();
               float ioWaitRatioMetric = (float) busyExecutors / (float) numIoThreads;
               // Metric: netIdlePercentile -> ioWaitRatioMetric
               writeNetIdlePercentileMetrics(stream, "brk_net_idle_percentile", (1 - ioWaitRatioMetric) * 100f,
                                             Collector.Type.GAUGE, clusterName, currentTimeMillis);
   
               // general network io queue metrics
               Iterator<EventExecutor> iterator1 = workerGroup.iterator();
               while (iterator1.hasNext()) {
                   SingleThreadEventExecutor next = (SingleThreadEventExecutor) iterator1.next();
                   int pendingTasks = next.pendingTasks();
                   String name = "brk_pending_tasks_" + StringUtils.replace(next.threadProperties().name(),
                                                                            "-", "_");
                   // Metric: thread-name -> pendingTasks
                   writeNetIdlePercentileMetrics(stream, name, pendingTasks,
                                                 Collector.Type.GAUGE, clusterName, currentTimeMillis);
               }
           } catch (Exception e) {
               log.error("generate network idle percent metrics failed, error: [()]", e);
           }
   }
   ```
   
   ### Alternatives
   
   _No response_
   
   ### Anything else?
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on issue #17090: PIP-XYZ: Support latency quantile metric for pulsar read and write

Posted by GitBox <gi...@apache.org>.
tisonkun commented on issue #17090:
URL: https://github.com/apache/pulsar/issues/17090#issuecomment-1217681681

   I think 198 is left for this proposal. @3286360470 please update the title.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tjiuming commented on issue #17090: PIP-202: Support latency quantile metric for pulsar read and write

Posted by GitBox <gi...@apache.org>.
tjiuming commented on issue #17090:
URL: https://github.com/apache/pulsar/issues/17090#issuecomment-1222305858

   Prometheus is highly recommended


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy commented on issue #17090: PIP-XYZ: Support latency quantile metric for pulsar read and write

Posted by GitBox <gi...@apache.org>.
AnonHxy commented on issue #17090:
URL: https://github.com/apache/pulsar/issues/17090#issuecomment-1216173975

   Change the "PIP-XYZ" to a determined number please.   To determine the appropriate PIP number XYZ, inspect the [mailing list](https://lists.apache.org/list.html?dev@pulsar.apache.org) for the most recent PIP. Add 1 to that PIP's number to get your PIP's number. @3286360470 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #17090: PIP-202: Support latency quantile metric for pulsar read and write

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #17090:
URL: https://github.com/apache/pulsar/issues/17090#issuecomment-1254431793

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] 3286360470 commented on issue #17090: PIP-XYZ: Support latency quantile metric for pulsar read and write

Posted by GitBox <gi...@apache.org>.
3286360470 commented on issue #17090:
URL: https://github.com/apache/pulsar/issues/17090#issuecomment-1219407950

   > I think 198 is left for this proposal. @3286360470 please update the title.
   
   ok, I have updated [XYZ] to 202.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org