You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pegasus.apache.org by wu...@apache.org on 2021/04/16 07:40:07 UTC

[incubator-pegasus-website] branch master updated: Add doc of Hotspot detection (#3)

This is an automated email from the ASF dual-hosted git repository.

wutao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pegasus-website.git


The following commit(s) were added to refs/heads/master by this push:
     new e2e289e  Add doc of Hotspot detection (#3)
e2e289e is described below

commit e2e289ed2613142fe2b89c6d4a008bebb8af061b
Author: Smilencer <52...@qq.com>
AuthorDate: Fri Apr 16 15:40:01 2021 +0800

    Add doc of Hotspot detection (#3)
---
 _data/docs_menu.yml                          |  2 +
 _data/en/translate.yml                       |  1 +
 _data/zh/translate.yml                       |  1 +
 _docs/en/administration/hotspot-detection.md |  5 ++
 _docs/zh/administration/hotspot-detection.md | 86 ++++++++++++++++++++++++++++
 5 files changed, 95 insertions(+)

diff --git a/_data/docs_menu.yml b/_data/docs_menu.yml
index eff39c1..405f3d0 100644
--- a/_data/docs_menu.yml
+++ b/_data/docs_menu.yml
@@ -96,3 +96,5 @@
       link: /administration/whitelist
     - name: title_backup-request
       link: /administration/backup-request
+    - name: title_hotspot-detection
+      link: /administration/hotspot-detection
diff --git a/_data/en/translate.yml b/_data/en/translate.yml
index 1394af1..06b74e1 100644
--- a/_data/en/translate.yml
+++ b/_data/en/translate.yml
@@ -66,4 +66,5 @@ title_docs: "The Pegasus documentation"
 title_tools: "Tools"
 title_admin_cli: "Admin CLI"
 title_pegic: "Pegasus data access CLI"
+title_hotspot-detection: "Hotspot Detection"
 global_toc: "Table of contents"
diff --git a/_data/zh/translate.yml b/_data/zh/translate.yml
index f009afe..cb7a822 100644
--- a/_data/zh/translate.yml
+++ b/_data/zh/translate.yml
@@ -66,4 +66,5 @@ title_docs: "Pegasus产品文档"
 title_tools: "生态工具"
 title_admin_cli: "集群管理命令行"
 title_pegic: "数据访问命令行"
+title_hotspot-detection: "热点检测"
 global_toc: "本页导航"
diff --git a/_docs/en/administration/hotspot-detection.md b/_docs/en/administration/hotspot-detection.md
new file mode 100644
index 0000000..8ea58d2
--- /dev/null
+++ b/_docs/en/administration/hotspot-detection.md
@@ -0,0 +1,5 @@
+---
+permalink: administration/partition-split
+---
+
+TRANSLATING
diff --git a/_docs/zh/administration/hotspot-detection.md b/_docs/zh/administration/hotspot-detection.md
new file mode 100644
index 0000000..0eeb0aa
--- /dev/null
+++ b/_docs/zh/administration/hotspot-detection.md
@@ -0,0 +1,86 @@
+---
+permalink: administration/hotspot-detection
+---
+
+# 功能简介
+Pegasus 是一个以 hash 分片打散数据的分布式存储系统。通常情况下,流量会均匀地打在集群中的所有节点上。但是在极端情况下,比如 `hashkey` 设计不合理、出现了热点事件/热点用户、业务代码逻辑错误等场景下,Pegasus 单机节点往往会负载过高从而影响服务整体的可用性。于是我们设计了一套热点检测方案帮助运维人员能及时发现热点问题并找出热点流量。
+
+# 热点分片查询
+
+## 设计原理
+Collector 周期性从集群拿到的各个分片的读写流量进行数据分析,对每个分片通过纵向的历史数据和横向同期数据对比,计算 [Z-score](https://en.wikipedia.org/wiki/Standard_score) 用来描述分片的热点情况。在开启 `enable_hotkey_auto_detect` 选项后,Collector 会自动向热点分片发送 [热点流量查询](#热点流量查询) 请求,统计当前异常的热点流量。
+
+## 操作示例
+在配置文件中添加以下几个配置项,然后重启 Collector:
+```shell
+[pegasus.collector]
+# 开启热点流量自动检测功能,当热点分片被确认之后,
+# Collector 会向对应的分片发送热点流量查询请求
+enable_hotkey_auto_detect = true
+
+# 热点分片阈值(Z-score)为 3。在这里可以理解为算法的灵敏度,
+# 超过阈值的会被判定成热点分片。
+# 在测试中,我们认为阈值设为 3 为比较合理的选项。
+hot_partition_threshold = 3
+
+# 单个分片被判定为热点的累积次数超过这个值就会触发热点流量自动检测。
+occurrence_threshold = 100
+```
+
+## 相关监控
+```
+app.stat.hotspots@{app_name}.{hotkey_type}.{partition_count}
+```
+hotkey_type 分为 `read` 和 `write` 分别代表读/写热点
+
+# 热点流量查询
+## 设计原理
+在 replica 收到对应分片的热点流量查询请求后,会记录统计一段时间的流量,从而分析出具体的热点流量。如果周期时间内找不到热点流量,收集会自动停止。
+
+## 操作示例
+**开启热点流量检测**
+
+你需要在命令行中添加探测表的 app_id、分片号、热点数据类型、需要探测的节点地址
+```
+>>> detect_hotkey -a 3 -p 1 -t write -c start -d 10.231.57.104:34802
+Detect hotkey rpc is starting, use 'detect_hotkey -a 3 -p 1 -t write -c query -d 10.231.57.104:34802' to get the result later
+```
+**查询热点流量结果**
+
+当热点流量检测未结束时,会受到如下提示
+```
+>>> detect_hotkey -a 3 -p 1 -t write -c query -d 10.231.57.104:34802
+Hotkey detect rpc performed failed, in 3.1, error_hint:ERR_BUSY Can't get hotkey now, now state: hotkey_collector_state::COARSE_DETECTING
+```
+
+成功获取到热点流量 `hashkey = Thisishotkey1`,后的结果
+```
+>>> detect_hotkey -a 3 -p 2 -t write -c query -d 10.231.57.104:34802
+Find write hotkey in 3.2 result:\"Thisishotkey1\"
+```
+
+周期内无法检测到热点流量的结果
+```
+>>> detect_hotkey -a 3 -p 2 -t write -c query -d 10.231.57.104:34803
+Hotkey detect rpc performed failed, in 3.2, error_hint:ERR_BUSY Can't get hotkey now, now state: hotkey_collector_state::STOPPED
+```
+
+**结束热点流量检测**
+```
+>>> detect_hotkey -a 3 -p 2 -t write -c stop -d 10.231.57.104:34803
+Detect hotkey rpc is stopped now
+```
+无论是检测成功还是检测失败都要先 stop 这次探测才能开始下一次探测
+
+## 相关配置
+```shell
+[pegasus.server]
+# 粗粒度筛查热点流量的阈值,灵敏度负相关
+hot_key_variance_threshold = 5
+# 细粒度筛查热点流量的阈值,灵敏度负相关
+hot_bucket_variance_threshold = 7
+# 设置为负数,一般不推荐改动
+hotkey_buckets_num = 37
+# 一次探测最长时间
+max_seconds_to_detect_hotkey = 150
+```

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pegasus.apache.org
For additional commands, e-mail: commits-help@pegasus.apache.org