You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@camel.apache.org by as...@apache.org on 2020/12/03 10:47:00 UTC

[camel-k] 04/08: chore(doc): Add SOPs for CamelKSuccessBuildDuration5m and CamelKBuildError SLOs

This is an automated email from the ASF dual-hosted git repository.

astefanutti pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/camel-k.git

commit 1a8c9d3a4018dead556344706435f12c8b51dfa1
Author: Antonin Stefanutti <an...@stefanutti.fr>
AuthorDate: Wed Dec 2 17:11:45 2020 +0100

    chore(doc): Add SOPs for CamelKSuccessBuildDuration5m and CamelKBuildError SLOs
---
 .../ROOT/pages/troubleshooting/operating.adoc      | 55 ++++++++++++++++++++--
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/docs/modules/ROOT/pages/troubleshooting/operating.adoc b/docs/modules/ROOT/pages/troubleshooting/operating.adoc
index 80cd352..68ea05f 100644
--- a/docs/modules/ROOT/pages/troubleshooting/operating.adoc
+++ b/docs/modules/ROOT/pages/troubleshooting/operating.adoc
@@ -67,7 +67,7 @@ $ kubectl logs deployment/camel-k-operator --since=1h \
 | "-n \(.namespace) \(.controller | rtrimstr("-controller"))/\(.name)"' \
 | xargs kubectl describe
 ----
-Check the resource events.
+Check the resource specification and events.
 
 * Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
 
@@ -76,7 +76,7 @@ Check the resource events.
 ==== Description
 
 This alert has severity level of "warning".
-It's firing when more than 1% of the successful builds have their duration above 2 min.
+It's firing when more than 10% of the successful builds have their duration above 2 min.
 
 ==== Troubleshooting
 
@@ -93,6 +93,55 @@ $ kubectl get builds.camel.apache.org -o json \
 | "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
 | xargs kubectl describe
 ----
-Check the resource events.
+Check the resource specification and events.
+
+* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
+
+=== CamelKSuccessBuildDuration5m
+
+=== Description
+
+This alert has severity level of "critical".
+It's firing when more than 1% of the successful builds have their duration above 5 min.
+
+==== Troubleshooting
+
+* Inspect the successful Builds whose duration is longer than 5 minutes, e.g.:
++
+[source,sh]
+----
+$ kubectl get builds.camel.apache.org -o json \
+| jq -r '.items[]
+| select(.status.phase == "Succeeded")
+| select(.status.duration
+  | "01-Jan-1970 \(sub("(?<time>.*)\\..*"; "\(.time)s"))" | strptime("%d-%b-%Y %Mm%Ss")? // strptime("%d-%b-%Y %Ss")
+  | mktime > 300)
+| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
+| xargs kubectl describe
+----
+Check the resource specification and events.
+
+* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
+
+=== CamelKBuildError
+
+=== Description
+
+This alert has severity level of "critical".
+It's firing when more than 1% of the builds have errored over at least 10 min.
+
+==== Troubleshooting
+
+* Inspect the errored Builds, e.g.:
++
+[source,sh]
+----
+$ kubectl get builds.camel.apache.org -o json \
+| jq -r '.items[]
+| select(.status.phase == "Error")
+| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
+| xargs kubectl get -o jsonpath='{.metadata.namespace}{"/"}{.metadata.name}{"\nError: "}{.status.error}{"\n"}'
+----
+Check the resource specification and events.
 
 * Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.