You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by wu...@apache.org on 2018/11/30 14:11:50 UTC
[incubator-skywalking] branch endpoint-and-instance-alarm updated:
Fix alarm default settings and document.
This is an automated email from the ASF dual-hosted git repository.
wusheng pushed a commit to branch endpoint-and-instance-alarm
in repository https://gitbox.apache.org/repos/asf/incubator-skywalking.git
The following commit(s) were added to refs/heads/endpoint-and-instance-alarm by this push:
new f6e7de8 Fix alarm default settings and document.
f6e7de8 is described below
commit f6e7de86c4fd19e4717bee0d9cae96803d4523a9
Author: Wu Sheng <wu...@foxmail.com>
AuthorDate: Fri Nov 30 22:11:32 2018 +0800
Fix alarm default settings and document.
---
docs/en/setup/backend/backend-alarm.md | 11 ++++++++++-
.../src/main/assembly/alarm-settings.yml | 18 +++++++++---------
2 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/docs/en/setup/backend/backend-alarm.md b/docs/en/setup/backend/backend-alarm.md
index c4ff6f5..aa22b9f 100644
--- a/docs/en/setup/backend/backend-alarm.md
+++ b/docs/en/setup/backend/backend-alarm.md
@@ -49,9 +49,18 @@ rules:
count: 4
```
+## Default alarm rules
+We provided a default `alarm-setting.yml` in our distribution only for convenience, which including following rules
+1. Service average response time over 1s in last 3 minutes.
+1. Service success rate lower than 80% in last 2 minutes.
+1. Service 90% response time is lower than 1000ms in last 3 minutes
+1. Service Instance average response time over 1s in last 2 minutes.
+1. Endpoint average response time over 1s in last 2 minutes.
+
+
## List of all potential metric name
The metric names are defined in official [OAL scripts](../../guides/backend-oal-scripts.md), right now
-only metric from **Service** scope could be used in Alarm, we will extend in further versions.
+metric from **Service**, **Service Instance**, **Endpoint** scopes could be used in Alarm, we will extend in further versions.
Submit issue or pull request if you want to support any other scope in alarm.
diff --git a/oap-server/server-starter/src/main/assembly/alarm-settings.yml b/oap-server/server-starter/src/main/assembly/alarm-settings.yml
index 5fdaa7b..3cd65a9 100644
--- a/oap-server/server-starter/src/main/assembly/alarm-settings.yml
+++ b/oap-server/server-starter/src/main/assembly/alarm-settings.yml
@@ -36,7 +36,7 @@ rules:
count: 2
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 3
- message: Successful rate of service {name} is lower than 80% in last 2 minuts.
+ message: Successful rate of service {name} is lower than 80% in last 2 minutes.
service_p90_sla_rule:
# Indicator value need to be long, double or int
indicator-name: service_p90
@@ -50,17 +50,17 @@ rules:
indicator-name: service_instance_resp_time
op: ">"
period: 10
- count: 3
+ count: 2
silence-period: 5
- message: Response time of service instance {name} is more than 1000ms in last 3 minutes.
- endpoint_sla_rule:
- indicator-name: endpoint_sla
- op: "<"
- threshold: 8000
+ message: Response time of service instance {name} is more than 1000ms in last 2 minutes.
+ endpoint_avg_rule:
+ indicator-name: endpoint_avg
+ op: ">"
+ threshold: 1000
period: 10
count: 2
- silence-period: 3
- message: Successful rate of endpoint {name} is lower than 80% in last 2 minuts.
+ silence-period: 5
+ message: Response time of endpoint {name} is more than 1000ms in last 2 minutes.
webhooks:
# - http://127.0.0.1/notify/