You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by ke...@apache.org on 2021/05/27 08:12:27 UTC

[skywalking] branch alarm/events created (now ecc2aec)

This is an automated email from the ASF dual-hosted git repository.

kezhenxu94 pushed a change to branch alarm/events
in repository https://gitbox.apache.org/repos/asf/skywalking.git.


      at ecc2aec  Events can be configured as alarm source

This branch includes the following new commits:

     new ecc2aec  Events can be configured as alarm source

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


[skywalking] 01/01: Events can be configured as alarm source

Posted by ke...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

kezhenxu94 pushed a commit to branch alarm/events
in repository https://gitbox.apache.org/repos/asf/skywalking.git

commit ecc2aecd4e71fc5272f92e7fefa5f12b70435250
Author: kezhenxu94 <ke...@apache.org>
AuthorDate: Thu May 27 16:11:50 2021 +0800

    Events can be configured as alarm source
---
 CHANGES.md                                         |  1 +
 docs/en/concepts-and-designs/event.md              | 41 +++++++++++++++++++++-
 .../src/main/resources/alarm-settings.yml          |  9 +++++
 .../skywalking/oap/server/core/event/Event.java    | 32 ++++++++++++++++-
 4 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/CHANGES.md b/CHANGES.md
index e22c381..ea93875 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -54,6 +54,7 @@ Release Notes.
 * Include events of the entity(s) in the alarm.
 * Support `native-json` format log in kafka-fetcher-plugin.
 * Fix counter misuse in the alarm core. Alarm can't be triggered in time.
+* Events can be configured as alarm source.
 
 #### UI
 * Add logo for kong plugin.
diff --git a/docs/en/concepts-and-designs/event.md b/docs/en/concepts-and-designs/event.md
index e73d90b..73c257b 100644
--- a/docs/en/concepts-and-designs/event.md
+++ b/docs/en/concepts-and-designs/event.md
@@ -55,10 +55,49 @@ The end time of the event. This field may be empty if the event has not ended ye
 **NOTE:** When reporting an event, you typically call the report function twice, the first time for starting of the event and the second time for ending of the event, both with the same UUID.
 There are also cases where you would already have both the start time and end time. For example, when exporting events from a third-party system, the start time and end time are already known so you may simply call the report function once.
 
+## How to Configure Alarms for Events
+
+Events are derived from metrics, and can be the source to trigger alarms. For example, if a specific event occurs for a
+certain times in a period, alarms can be triggered and sent.
+
+Every event has a default `count = 1`, when `n` events with the same name are reported, they are aggregated
+into `count = n` as follows.
+
+```
+Event{name=Unhealthy, source={service=A,instance=a}, ...}
+Event{name=Unhealthy, source={service=A,instance=a}, ...}
+Event{name=Unhealthy, source={service=A,instance=a}, ...}
+Event{name=Unhealthy, source={service=A,instance=a}, ...}
+Event{name=Unhealthy, source={service=A,instance=a}, ...}
+Event{name=Unhealthy, source={service=A,instance=a}, ...}
+```
+
+will be aggregated into
+
+```
+Event{name=Unhealthy, source={service=A,instance=a}, ...} <count = 6>
+```
+
+so you can configure the following alarm rule to trigger alarm when `Unhealthy` event occurs more than 5 times within 10
+minutes.
+
+```yaml
+rules:
+  unhealthy_event_rule:
+    metrics-name: Unhealthy
+    threshold: 5
+    op: ">"
+    period: 10
+    count: 1
+    message: Service instance has been unhealthy for 10 minutes
+```
+
+For more alarm configuration details, please refer to the [alarm doc](../setup/backend/backend-alarm.md).
+
 ## Known Events
 
 | Name | Type | When |
 | :----: | :----: | :-----|
 | Start | Normal | When your Java Application starts with SkyWalking Agent installed, the `Start` Event will be created. |
 | Shutdown | Normal | When your Java Application stops with SkyWalking Agent installed, the `Shutdown` Event will be created.  |
-| Alarm | Error | When the Alarm is triggered, the corresponding `Alarm` Event will is created. |
\ No newline at end of file
+| Alarm | Error | When the Alarm is triggered, the corresponding `Alarm` Event will is created. |
diff --git a/oap-server/server-bootstrap/src/main/resources/alarm-settings.yml b/oap-server/server-bootstrap/src/main/resources/alarm-settings.yml
index 0efbe26..05cd072 100755
--- a/oap-server/server-bootstrap/src/main/resources/alarm-settings.yml
+++ b/oap-server/server-bootstrap/src/main/resources/alarm-settings.yml
@@ -40,6 +40,15 @@ rules:
     count: 1
     tags:
       level: WARNING
+#  unhealthy_event_rule:
+#    metrics-name: Unhealthy
+#    threshold: 5
+#    op: ">"
+#    period: 10
+#    count: 1
+#    message: Service instance has been unhealthy for 10 minutes
+#    tags:
+#      level: ERROR
 
 webhooks:
 #  - http://127.0.0.1/notify/
diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/event/Event.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/event/Event.java
index 7065589..a4471cb 100644
--- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/event/Event.java
+++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/event/Event.java
@@ -24,15 +24,21 @@ import lombok.EqualsAndHashCode;
 import lombok.Getter;
 import lombok.Setter;
 import org.apache.skywalking.apm.util.StringUtil;
+import org.apache.skywalking.oap.server.core.analysis.IDManager;
 import org.apache.skywalking.oap.server.core.analysis.MetricsExtension;
 import org.apache.skywalking.oap.server.core.analysis.Stream;
 import org.apache.skywalking.oap.server.core.analysis.TimeBucket;
+import org.apache.skywalking.oap.server.core.analysis.metrics.LongValueHolder;
 import org.apache.skywalking.oap.server.core.analysis.metrics.Metrics;
+import org.apache.skywalking.oap.server.core.analysis.metrics.MetricsMetaInfo;
+import org.apache.skywalking.oap.server.core.analysis.metrics.WithMetadata;
 import org.apache.skywalking.oap.server.core.analysis.worker.MetricsStreamProcessor;
 import org.apache.skywalking.oap.server.core.remote.grpc.proto.RemoteData;
+import org.apache.skywalking.oap.server.core.source.DefaultScopeDefine;
 import org.apache.skywalking.oap.server.core.source.ScopeDeclaration;
 import org.apache.skywalking.oap.server.core.storage.StorageHashMapBuilder;
 import org.apache.skywalking.oap.server.core.storage.annotation.Column;
+import org.elasticsearch.common.Strings;
 
 import static org.apache.skywalking.oap.server.core.source.DefaultScopeDefine.EVENT;
 
@@ -45,7 +51,7 @@ import static org.apache.skywalking.oap.server.core.source.DefaultScopeDefine.EV
     of = "uuid"
 )
 @MetricsExtension(supportDownSampling = false, supportUpdate = true)
-public class Event extends Metrics {
+public class Event extends Metrics implements WithMetadata, LongValueHolder {
 
     public static final String INDEX_NAME = "events";
 
@@ -104,10 +110,14 @@ public class Event extends Metrics {
     @Column(columnName = END_TIME)
     private long endTime;
 
+    private transient long count = 1;
+
     @Override
     public boolean combine(final Metrics metrics) {
         final Event event = (Event) metrics;
 
+        count++;
+
         // Set time bucket only when it's never set.
         if (getTimeBucket() <= 0) {
             if (event.getStartTime() > 0) {
@@ -193,6 +203,26 @@ public class Event extends Metrics {
         return hashCode();
     }
 
+    @Override
+    public MetricsMetaInfo getMeta() {
+        int scope = DefaultScopeDefine.SERVICE;
+        final String serviceId = IDManager.ServiceID.buildId(getService(), true);
+        String id = serviceId;
+        if (!Strings.isNullOrEmpty(getServiceInstance())) {
+            scope = DefaultScopeDefine.SERVICE_INSTANCE;
+            id = IDManager.ServiceInstanceID.buildId(serviceId, getServiceInstance());
+        } else if (!Strings.isNullOrEmpty(getEndpoint())) {
+            scope = DefaultScopeDefine.ENDPOINT;
+            id = IDManager.EndpointID.buildId(serviceId, getEndpoint());
+        }
+        return new MetricsMetaInfo(getName(), scope, id);
+    }
+
+    @Override
+    public long getValue() {
+        return getCount();
+    }
+
     public static class Builder implements StorageHashMapBuilder<Event> {
         @Override
         public Map<String, Object> entity2Storage(Event storageData) {