You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@linkis.apache.org by ca...@apache.org on 2022/07/13 06:55:04 UTC

[incubator-linkis-website] branch dev updated: Simplify the prometheus manual installation process in document (#429)

This is an automated email from the ASF dual-hosted git repository.

casion pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-linkis-website.git


The following commit(s) were added to refs/heads/dev by this push:
     new aabcebe3c0 Simplify the prometheus manual installation process in document (#429)
aabcebe3c0 is described below

commit aabcebe3c0f00df21ab3dba9e41bb57b43f4a57a
Author: Leomax_Sun <28...@qq.com>
AuthorDate: Wed Jul 13 14:54:59 2022 +0800

    Simplify the prometheus manual installation process in document (#429)
    
     Simplify the prometheus manual installation process in document
---
 docs/deployment/involve_prometheus_into_linkis.md  | 168 ++++++++++++---------
 .../deployment/involve_prometheus_into_linkis.md   |  11 +-
 2 files changed, 104 insertions(+), 75 deletions(-)

diff --git a/docs/deployment/involve_prometheus_into_linkis.md b/docs/deployment/involve_prometheus_into_linkis.md
index 109130481b..af516b675a 100644
--- a/docs/deployment/involve_prometheus_into_linkis.md
+++ b/docs/deployment/involve_prometheus_into_linkis.md
@@ -24,7 +24,7 @@ Prometheus scrapes metrics from instrumented jobs, either directly or via an int
 
 In the context of Linkis, we will use Eureka (Service Discover)SD in Prometheus to retrieve scrape targets using the Eureka REST API. And Prometheus will periodically check the REST endpoint and create a target for every app instance.
 
-## 2. Enable Prometheus and start Linkis
+## 2. How to Enable Prometheus
 
 ### 2.1 Enable Prometheus when installing Linkis
 
@@ -33,7 +33,7 @@ Modify the configuration item `PROMETHEUS_ENABLE` in linkis-env.sh of Linkis.
 ```bash
 export PROMETHEUS_ENABLE=true
 ````
-After running the `install.sh`, it's expected to see the configuration is appended inside the following files:
+After running the `install.sh`, it's expected to see the configuration related to `prometheus` is appended inside the following files:
 
 ```yaml
 ## application-linkis.yml  ##
@@ -41,7 +41,7 @@ After running the `install.sh`, it's expected to see the configuration is append
 eureka:
   instance:
     metadata-map:
-      prometheus.path: ${prometheus.path:/actuator/prometheus}
+      prometheus.path: ${prometheus.path:${prometheus.endpoint}}
 ...
 management:
   endpoints:
@@ -80,10 +80,36 @@ wds.linkis.prometheus.enable=true
 wds.linkis.server.user.restful.uri.pass.auth=/api/rest_j/v1/actuator/prometheus,
 ...
 ````
+### 2.2 Enable Prometheus after installation
+Modify`${LINKIS_HOME}/conf/application-linkis.yml`, add `prometheus` as exposed endpoints.
+```yaml
+## application-linkis.yml  ##
+management:
+  endpoints:
+    web:
+      exposure:
+        #Add prometheus
+        include: refresh,info,health,metrics,prometheus
+```
+Modify`${LINKIS_HOME}/conf/application-eureka.yml`, add `prometheus` as exposed endpoints.
+```yaml
+## application-eureka.yml  ##
+management:
+  endpoints:
+    web:
+      exposure:
+        #Add prometheus
+        include: refresh,info,health,metrics,prometheus
+````
+Modify`${LINKIS_HOME}/conf/linkis.properties`, remove the comment `#` before `prometheus.enable`
+```yaml
+## linkis.properties ##
+...
+wds.linkis.prometheus.enable=true
+...
+```
 
-**Note**: If you don't use `install.sh` script when installing Linkis, it's necessary to add the above configuration on your own.
-
-Then start Linkis.
+### 2.3 Start Linkis
 
 ```bash
 $ bash linkis-start-all.sh
@@ -91,7 +117,11 @@ $ bash linkis-start-all.sh
 
 After start the services, it's expected to access the prometheus endpoint of each microservice in the Linkis, for example, http://linkishost:9103/api/rest_j/v1/actuator/prometheus.
 
-## 3. Deploy the Prometheus, Alertmanager and Grafana
+:::caution 注意
+The prometheus endpoint of gateway/eureka don't include the prefix `api/rest_j/v1`, and the complete endpoint will be http://linkishost:9001/actuator/prometheus
+:::
+
+## 3. Demo for Deploying the Prometheus, Alertmanager and Grafana
 Usually the monitoring setup for a cloud native application will be deployed on kubernetes with service discovery and high availability (e.g. using a kubernetes operator like Prometheus Operator). To quickly prototype dashboards and experiment with different metric type options (e.g. histogram vs gauge) you may need a similar setup locally. This sector explains how to setup locally a Prometheus/Alert Manager and Grafana monitoring stack with Docker Compose.
 
 First, lets define a general component of the stack as follows:
@@ -151,71 +181,71 @@ As an example, the following configration convers the usual metrics used to moni
 - d. High NonHeap memory for each JVM instance (>80%)
 - e. High Waiting thread for each JVM instance (100)
 
-````yaml
+```yaml
 ## alertrule.yml ##
 groups:
-  - name: alerting_rules
+  - name: LinkisAlert
     rules:
-    - alert: LinkisNodeDown
-      expr: last_over_time(up{job="linkis", application=~"LINKIS.*", application!="LINKIS-CG-ENGINECONN"}[1m])== 0
-      for: 15s
-      labels:
-        severity: critical
-        service: Linkis
-        instance: "{{ $labels.instance }}"
-      annotations:
-        summary: "instance: {{ $labels.instance }} down"
-        description: "Linkis instance(s) is/are down in last 1m"
-        value: "{{ $value }}"
-    
-    - alert: LinkisNodeCpuHigh
-      expr: system_cpu_usage{job="linkis", application=~"LINKIS.*"} >= 0.8
-      for: 1m
-      labels:
-        severity: warning
-        service: Linkis
-        instance: "{{ $labels.instance }}"
-      annotations:
-        summary: "instance: {{ $labels.instance }} cpu overload"
-        description: "CPU usage is over 80% for over 1min"
-        value: "{{ $value }}"
-    
-    - alert: LinkisNodeHeapMemoryHigh
-      expr: sum(jvm_memory_used_bytes{job="linkis", application=~"LINKIS.*", area="heap"}) by(instance) *100/sum(jvm_memory_max_bytes{job="linkis", application=~"LINKIS.*", area="heap"}) by(instance) >= 80
-      for: 1m
-      labels:
-        severity: warning
-        service: Linkis
-        instance: "{{ $labels.instance }}"
-      annotations:
-        summary: "instance: {{ $labels.instance }} memory(heap) overload"
-        description: "Memory usage(heap) is over 80% for over 1min"
-        value: "{{ $value }}"
-    
-    - alert: LinkisNodeNonHeapMemoryHigh
-      expr: sum(jvm_memory_used_bytes{job="linkis", application=~"LINKIS.*", area="nonheap"}) by(instance) *100/sum(jvm_memory_max_bytes{job="linkis", application=~"LINKIS.*", area="nonheap"}) by(instance) >= 80
-      for: 1m
-      labels:
-        severity: warning
-        service: Linkis
-        instance: "{{ $labels.instance }}"
-      annotations:
-        summary: "instance: {{ $labels.instance }} memory(nonheap) overload"
-        description: "Memory usage(nonheap) is over 80% for over 1min"
-        value: "{{ $value }}"
-    
-    - alert: LinkisWaitingThreadHigh
-      expr: jvm_threads_states_threads{job="linkis", application=~"LINKIS.*", state="waiting"} >= 100
-      for: 1m
-      labels:
-        severity: warning
-        service: Linkis
-        instance: "{{ $labels.instance }}"
-      annotations:
-        summary: "instance: {{ $labels.instance }} waiting threads is high"
-        description: "waiting threads is over 100 for over 1min"
-        value: "{{ $value }}"
-````
+      - alert: LinkisNodeDown
+        expr: last_over_time(up{job="linkis", application=~"LINKISI.*", application!="LINKIS-CG-ENGINECONN"}[1m])== 0
+        for: 15s
+        labels:
+          severity: critical
+          service: Linkis
+          instance: "{{ $labels.instance }}"
+        annotations:
+          summary: "instance: {{ $labels.instance }} down"
+          description: "Linkis instance(s) is/are down in last 1m"
+          value: "{{ $value }}"
+
+      - alert: LinkisNodeCpuHigh
+        expr: system_cpu_usage{job="linkis", application=~"LINKIS.*"} >= 0.8
+        for: 1m
+        labels:
+          severity: warning
+          service: Linkis
+          instance: "{{ $labels.instance }}"
+        annotations:
+          summary: "instance: {{ $labels.instance }} cpu overload"
+          description: "CPU usage is over 80% for over 1min"
+          value: "{{ $value }}"
+
+      - alert: LinkisNodeHeapMemoryHigh
+        expr: sum(jvm_memory_used_bytes{job="linkis", application=~"LINKIS.*", area="heap"}) by(instance) *100/sum(jvm_memory_max_bytes{job="linkis", application=~"LINKIS.*", area="heap"}) by(instance) >= 50
+        for: 1m
+        labels:
+          severity: warning
+          service: Linkis
+          instance: "{{ $labels.instance }}"
+        annotations:
+          summary: "instance: {{ $labels.instance }} memory(heap) overload"
+          description: "Memory usage(heap) is over 80% for over 1min"
+          value: "{{ $value }}"
+
+      - alert: LinkisNodeNonHeapMemoryHigh
+        expr: sum(jvm_memory_used_bytes{job="linkis", application=~"LINKIS.*", area="nonheap"}) by(instance) *100/sum(jvm_memory_max_bytes{job="linkis", application=~"LINKIS.*", area="nonheap"}) by(instance) >= 60
+        for: 1m
+        labels:
+          severity: warning
+          service: Linkis
+          instance: "{{ $labels.instance }}"
+        annotations:
+          summary: "instance: {{ $labels.instance }} memory(nonheap) overload"
+          description: "Memory usage(nonheap) is over 80% for over 1min"
+          value: "{{ $value }}"
+
+      - alert: LinkisWaitingThreadHigh
+        expr: jvm_threads_states_threads{job="linkis", application=~"LINKIS.*", state="waiting"} >= 100
+        for: 1m
+        labels:
+          severity: warning
+          service: Linkis
+          instance: "{{ $labels.instance }}"
+        annotations:
+          summary: "instance: {{ $labels.instance }} waiting threads is high"
+          description: "waiting threads is over 100 for over 1min"
+          value: "{{ $value }}"
+```
 **Note**: Since once the service instance is shutdown, it will not be one of the target of Prometheus Eureka SD, and `up` metrics will not return any data after a short time. Thus we will collect if the `up=0` in the last one minute to determine whether the service is alive or not. 
 
 Third, and most importantly define Prometheus configuration in prometheus.yml file. This will defines:
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/involve_prometheus_into_linkis.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/involve_prometheus_into_linkis.md
index 3b5fdafc9d..38a1eaa1aa 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/involve_prometheus_into_linkis.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/involve_prometheus_into_linkis.md
@@ -45,7 +45,7 @@ export PROMETHEUS_ENABLE=true
 eureka:
   instance:
     metadata-map:
-      prometheus.path: ${prometheus.path:/actuator/prometheus}
+      prometheus.path: ${prometheus.path:${prometheus.endpoint}}
 ...
 management:
   endpoints:
@@ -86,7 +86,7 @@ wds.linkis.server.user.restful.uri.pass.auth=/api/rest_j/v1/actuator/prometheus,
 ...
 ```
 ### 2.2 已经安装后 启用 Prometheus
-修改`${LINKIS_HOME}/conf/application-linkis.yml` 
+修改`${LINKIS_HOME}/conf/application-linkis.yml`
 endpoints配置修改 增加`prometheus`
 ```yaml
 ## application-linkis.yml  ##
@@ -98,8 +98,7 @@ management:
         include: refresh,info,health,metrics,prometheus
 ```
 
-修改`${LINKIS_HOME}/conf/application-eureka.yml` 
-endpoints配置修改 增加`prometheus`
+修改`${LINKIS_HOME}/conf/application-eureka.yml`,endpoints配置修改增加`prometheus`
 ```yaml
 ## application-eureka.yml  ##
 management:
@@ -109,12 +108,11 @@ management:
         #增加 prometheus
         include: refresh,info,health,metrics,prometheus
 ````
-修改`${LINKIS_HOME}/conf/linkis.properties` 
+修改`${LINKIS_HOME}/conf/linkis.properties`,去掉`prometheus.enable`前的注释
 ```yaml
 ## linkis.properties ##
 ...
 wds.linkis.prometheus.enable=true
-wds.linkis.server.user.restful.uri.pass.auth=/api/rest_j/v1/actuator/prometheus,
 ...
 ```
 
@@ -125,6 +123,7 @@ $ bash linkis-start-all.sh
 ````
 
 Linkis启动后,各个微服务的prometheus端点是可以直接被访问的,例如http://linkishost:9103/api/rest_j/v1/actuator/prometheus
+
 :::caution 注意
 gateway/eureka 服务prometheus端点是没有`api/rest_j/v1`前缀的   http://linkishost:9001/actuator/prometheus
 :::


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@linkis.apache.org
For additional commands, e-mail: commits-help@linkis.apache.org