You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@dubbo.apache.org by GitBox <gi...@apache.org> on 2021/11/03 10:23:46 UTC

[GitHub] [dubbo-go] XiaoWeiKIN opened a new issue #1556: observability design

XiaoWeiKIN opened a new issue #1556:
URL: https://github.com/apache/dubbo-go/issues/1556


   本文详细阐述了 dubbo-go 可观测性的设计思路
   
   # 1. 指标采集
   
   监控系统的四个黄金指标
   
   - 延迟(latency)
   - 流量(qps)
   - 错误率
   
   ​       gRpc 以错误码代表一次调用的返回类型,实际上是和 HTTP/2 对齐的。但是dubbo 协议没有返回码,所以在这里不增加`err_status`的 label, 希望由 tripe 协议的监控进行采集。
   
   - 饱和度
   
     服务容量有多“满”。通常是系统中目前最为受限的某种资源的某个具体指标的度量。(在内存受限的系统中,即为内存;在I/O受限的系统中,即为I/O)。这里要注意,很多系统在达到100% 利用率之前性能会严重下降,增加一个利用率目标也是很重要的。
   
     在 dubbo-go 这种以RPC调用为主的服务治理框架中,饱和度反映多为协程数量。
   
   #  2. 长尾问题
   
   下面引用自 《Googel SRE 运维解密》
   
   > 构建监控系统时,很多人都倾向于采用某种量化指标的平均值:延迟平均值,节点的平均CPU使用率,数据库容量的平均值等。后两个例子中存在的问题是很明显的:CPU和数据库的利用率可能波动很大,但是同样的道理也适用于延迟。如果某个Web服务每秒处理1000个请求,平均请求延迟为100ms。那么1%的请求可能会占用5s时间。[10]如果用户依赖几个这样的服务来渲染页面,那么某个后端请求的延迟的99%可能就会成为前端延迟的中位数。区分平均值的“慢”和长尾值的“慢”的一个最简单办法是将请求按延迟分组计数(可以用来制作直方图):延迟为0~10ms之间的请求数量有多少,30~100ms之间,100~300ms之间等。将直方图的边界定义为指数型增长(这个例子中倍数约为3)是直观展现请求分布的最好方式。
   
     
   
   所以采用 Prometheus Histogram 是非常好的方式,这里简单的阐述下为什么不使用Summary :
   
   在Prometheus中,Histogram 会预先划分若干个**bucket**,Histogram **不会保存数据采样点值,每个bucket只记录落在本区间样本数的counter**,即histogram存储的是区间的样本数统计值,在服务端(prometheus)我们可以通过 [histogram_quantile()](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile "histogram_quantile") 函数来计算分位数,因此比较适合高并发的数据收集。而 Summary 则在客户端 直接存储了 quantile 数据,它计算的数据比较精确,但是有全局锁,不适合高并发的场景。
   
   
   
     # 3. Histogram 采集数据最佳实践
   
   **使用 Histogram 重要的是 bucket 的划分**,bucket 太多了或者太少了都不行,而 dubbo-go 提供的划分比较简单。这里 参考 [micromete histograms_and_percentiles](https://micrometer.io/docs/concepts#_histograms_and_percentiles) [PercentileHistogramBuckets](https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/distribution/PercentileHistogramBuckets.java "PercentileHistogramBuckets")的划分算法,这个算法是Netfix根据经验值而划分的,预先生成了 1ms-60s 的72个桶。
   
   # 4. 上报方式
   
   **支持使用 push 模式进行数据上报**,因为 pull 模式需要 prometheus 支持服务发现,而一些主流的注册中心,比如naocs 还不支持 prometheus。额外暴露一个 port,端口的安全策略也需要保证。若同一个 IP 部署了多个应用,则会导致 port 难以获取,实现起来比较复杂。
   
   
   
   # 5. 指标计算示例
   
   - 客户端 某个服务 1分钟 TP 99
   
   ```
   histogram_quantile(0.99, sum(rate(consumer_service_histogram_bucket{application="$consumer",service=~"$service",instance=~"$consumer_instance"}[1m])) by (le))
   ```
   
   - 服务端 某个服务 1 分钟 成功 qps 
   
   ```
    sum(rate(provider_service_histogram_count{application="$provider",service=~"$service",instance=~"$provider_instance",category="successful"}[1m]))
   ```
   
   - 服务端 某个服务 1 分钟 失败 qps 
   
    ```
   sum(rate(provider_service_histogram_count{application="$provider",service=~"$service",instance=~"$provider_instance",category="failure"}[1m]))
    ```
   
   - 查看 1分钟内 服务端负载是否均衡
   
   ```
   sum by (instance) (
     rate(provider_service_histogram_count{application="$provider",service=~"$service"}[1m])
   )
   ```
   
   
   
     
   
     
   
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org
For additional commands, e-mail: notifications-help@dubbo.apache.org


[GitHub] [dubbo-go] XiaoWeiKIN commented on issue #1556: observability design

Posted by GitBox <gi...@apache.org>.
XiaoWeiKIN commented on issue #1556:
URL: https://github.com/apache/dubbo-go/issues/1556#issuecomment-958909114






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org
For additional commands, e-mail: notifications-help@dubbo.apache.org


[GitHub] [dubbo-go] XiaoWeiKIN commented on issue #1556: observability design

Posted by GitBox <gi...@apache.org>.
XiaoWeiKIN commented on issue #1556:
URL: https://github.com/apache/dubbo-go/issues/1556#issuecomment-958916230


   另外 协议自身也需要一定的监控,比如协议出入口的流量。但是 filter 做不到这一点,因为他是在通信层之上的,不能获取到出入口流量信息。 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org
For additional commands, e-mail: notifications-help@dubbo.apache.org


[GitHub] [dubbo-go] XiaoWeiKIN commented on issue #1556: observability design

Posted by GitBox <gi...@apache.org>.
XiaoWeiKIN commented on issue #1556:
URL: https://github.com/apache/dubbo-go/issues/1556#issuecomment-958909114






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org
For additional commands, e-mail: notifications-help@dubbo.apache.org


[GitHub] [dubbo-go] XiaoWeiKIN commented on issue #1556: observability design

Posted by GitBox <gi...@apache.org>.
XiaoWeiKIN commented on issue #1556:
URL: https://github.com/apache/dubbo-go/issues/1556#issuecomment-958909114


    顺路参考了下 grpc [go-grpc-prometheus](https://github.com/grpc-ecosystem/go-grpc-prometheus)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@dubbo.apache.org
For additional commands, e-mail: notifications-help@dubbo.apache.org