You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@camel.apache.org by as...@apache.org on 2020/12/03 10:47:03 UTC

[camel-k] 07/08: chore(doc): Add SOPs for CamelKBuildQueueDuration1m and CamelKBuildQueueDuration5m SLOs

This is an automated email from the ASF dual-hosted git repository.

astefanutti pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/camel-k.git

commit 2134663f8058eb431774991aca30034d6c8965f0
Author: Antonin Stefanutti <an...@stefanutti.fr>
AuthorDate: Wed Dec 2 18:05:22 2020 +0100

    chore(doc): Add SOPs for CamelKBuildQueueDuration1m and CamelKBuildQueueDuration5m SLOs
---
 .../ROOT/pages/troubleshooting/operating.adoc      | 56 +++++++++++++++++++++-
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/docs/modules/ROOT/pages/troubleshooting/operating.adoc b/docs/modules/ROOT/pages/troubleshooting/operating.adoc
index efedec1..b3eb8a3 100644
--- a/docs/modules/ROOT/pages/troubleshooting/operating.adoc
+++ b/docs/modules/ROOT/pages/troubleshooting/operating.adoc
@@ -99,7 +99,7 @@ Check the resource specification and events.
 
 === CamelKSuccessBuildDuration5m
 
-=== Description
+==== Description
 
 This alert has severity level of "critical".
 It's firing when more than 1% of the successful builds have their duration above 5 min.
@@ -125,7 +125,7 @@ Check the resource specification and events.
 
 === CamelKBuildError
 
-=== Description
+==== Description
 
 This alert has severity level of "critical".
 It's firing when more than 1% of the builds have errored over at least 10 min.
@@ -142,6 +142,58 @@ $ kubectl get builds.camel.apache.org -o json \
 | "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
 | xargs -L1 kubectl get -o jsonpath='{.metadata.namespace}{"/"}{.metadata.name}{"\nError: "}{.status.error}{"\n"}'
 ----
+Check the error message.
+
+* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
+
+=== CamelKBuildQueueDuration1m
+
+==== Description
+
+This alert has severity level of "warning".
+It's firing when more than 1% of the builds have been queued for more than 1 min.
+
+==== Troubleshooting
+
+* Inspect the Builds that have been queued for more than 1 minutes, e.g.:
++
+[source,sh]
+----
+$ kubectl get builds.camel.apache.org -o json \
+| jq -r '.items[]
+| select(
+  (.status.startedAt | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime) -
+  (.status.failure.recovery.attemptTime? // .metadata.creationTimestamp | strptime("%Y-%m-%dT%H:%M:%SZ")
+  | mktime) > 60)
+| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
+| xargs -L1 kubectl describe
+----
+Check the resource specification and events.
+
+* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
+
+=== CamelKBuildQueueDuration5m
+
+==== Description
+
+This alert has severity level of "critical".
+It's firing when more than 1% of the builds have been queued for more than 5 min.
+
+==== Troubleshooting
+
+* Inspect the Builds that have been queued for more than 5 minutes, e.g.:
++
+[source,sh]
+----
+$ kubectl get builds.camel.apache.org -o json \
+| jq -r '.items[]
+| select(
+  (.status.startedAt | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime) -
+  (.status.failure.recovery.attemptTime? // .metadata.creationTimestamp | strptime("%Y-%m-%dT%H:%M:%SZ")
+  | mktime) > 300)
+| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \
+| xargs -L1 kubectl describe
+----
 Check the resource specification and events.
 
 * Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.