You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@camel.apache.org by as...@apache.org on 2020/12/03 10:47:00 UTC
[camel-k] 04/08: chore(doc): Add SOPs for
CamelKSuccessBuildDuration5m and CamelKBuildError SLOs
This is an automated email from the ASF dual-hosted git repository.
astefanutti pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/camel-k.git
commit 1a8c9d3a4018dead556344706435f12c8b51dfa1
Author: Antonin Stefanutti <an...@stefanutti.fr>
AuthorDate: Wed Dec 2 17:11:45 2020 +0100
chore(doc): Add SOPs for CamelKSuccessBuildDuration5m and CamelKBuildError SLOs
---
.../ROOT/pages/troubleshooting/operating.adoc | 55 ++++++++++++++++++++--
1 file changed, 52 insertions(+), 3 deletions(-)
diff --git a/docs/modules/ROOT/pages/troubleshooting/operating.adoc b/docs/modules/ROOT/pages/troubleshooting/operating.adoc
index 80cd352..68ea05f 100644
--- a/docs/modules/ROOT/pages/troubleshooting/operating.adoc
+++ b/docs/modules/ROOT/pages/troubleshooting/operating.adoc
@@ -67,7 +67,7 @@ $ kubectl logs deployment/camel-k-operator --since=1h \
| "-n \(.namespace) \(.controller | rtrimstr("-controller"))/\(.name)"' \
| xargs kubectl describe
----
-Check the resource events.
+Check the resource specification and events.
* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
@@ -76,7 +76,7 @@ Check the resource events.
==== Description
This alert has severity level of "warning".
-It's firing when more than 1% of the successful builds have their duration above 2 min.
+It's firing when more than 10% of the successful builds have their duration above 2 min.
==== Troubleshooting
@@ -93,6 +93,55 @@ $ kubectl get builds.camel.apache.org -o json \
| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
| xargs kubectl describe
----
-Check the resource events.
+Check the resource specification and events.
+
+* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
+
+=== CamelKSuccessBuildDuration5m
+
+=== Description
+
+This alert has severity level of "critical".
+It's firing when more than 1% of the successful builds have their duration above 5 min.
+
+==== Troubleshooting
+
+* Inspect the successful Builds whose duration is longer than 5 minutes, e.g.:
++
+[source,sh]
+----
+$ kubectl get builds.camel.apache.org -o json \
+| jq -r '.items[]
+| select(.status.phase == "Succeeded")
+| select(.status.duration
+ | "01-Jan-1970 \(sub("(?<time>.*)\\..*"; "\(.time)s"))" | strptime("%d-%b-%Y %Mm%Ss")? // strptime("%d-%b-%Y %Ss")
+ | mktime > 300)
+| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
+| xargs kubectl describe
+----
+Check the resource specification and events.
+
+* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
+
+=== CamelKBuildError
+
+=== Description
+
+This alert has severity level of "critical".
+It's firing when more than 1% of the builds have errored over at least 10 min.
+
+==== Troubleshooting
+
+* Inspect the errored Builds, e.g.:
++
+[source,sh]
+----
+$ kubectl get builds.camel.apache.org -o json \
+| jq -r '.items[]
+| select(.status.phase == "Error")
+| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
+| xargs kubectl get -o jsonpath='{.metadata.namespace}{"/"}{.metadata.name}{"\nError: "}{.status.error}{"\n"}'
+----
+Check the resource specification and events.
* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.