You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by wu...@apache.org on 2022/11/29 08:57:28 UTC

[skywalking] 01/01: Add docs for profiling, and adjust menu items.

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch profiling
in repository https://gitbox.apache.org/repos/asf/skywalking.git

commit 2ffc26d6d2f3794d71c0a5d43c7cf6ad0ce3689b
Author: Wu Sheng <wu...@foxmail.com>
AuthorDate: Tue Nov 29 16:57:13 2022 +0800

    Add docs for profiling, and adjust menu items.
---
 docs/en/changes/changes.md                |  1 +
 docs/en/concepts-and-designs/profiling.md | 82 +++++++++++++++++++++++++++++++
 docs/menu.yml                             | 10 ++--
 3 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md
index a615c064b7..412de8a128 100644
--- a/docs/en/changes/changes.md
+++ b/docs/en/changes/changes.md
@@ -197,5 +197,6 @@
 * Add new docs for `Report Span Attached Events` data collecting protocol.
 * Add new docs for `Record` query protocol
 * Update `Server Agents` and `Compatibility` for PHP agent.
+* Add docs for profiling.
 
 All issues and pull requests are [here](https://github.com/apache/skywalking/milestone/149?closed=1)
diff --git a/docs/en/concepts-and-designs/profiling.md b/docs/en/concepts-and-designs/profiling.md
new file mode 100644
index 0000000000..d198a372aa
--- /dev/null
+++ b/docs/en/concepts-and-designs/profiling.md
@@ -0,0 +1,82 @@
+# Profiling
+
+The profiling is an on-demand diagnosing method to locate bottleneck of the services.
+These typical scenarios usually are suitable for profiling through various profiling tools
+
+1. Some methods slow down the API performance.
+2. Too many threads and/or high-frequency I/O per OS process reduce the CPU efficiency.
+3. Massive RPC requests block the network to cause responding slowly.
+4. Unexpected network requests caused by security issues or codes' bug.
+
+In the SkyWalking landscape, we provided two ways to support profiling within reasonable resource cost.
+
+1. In-process profiling is bundled with auto-instrument agents.
+2. Out-of-process profiling is powered by eBPF agent.
+
+## In-process profiling
+
+In-process profiling is primarily provided by auto-instrument agents in the VM-based runtime.
+This feature resolves the issue <1> through capture the snapshot of the thread stacks periodically.
+The OAP would aggregate the thread stack per RPC request, and provide a hierarchy graph to indicate the slow methods
+based
+on continuous snapshot.
+
+The period is usually every 10-100 milliseconds, which is not recommended to be less, due to this capture would usually
+cause classical stop-the-world for the VM, which would impact the whole process performance.
+
+Learn more tech details from the post, [**Use Profiling to Fix the Blind Spot of Distributed
+Tracing**](sdk-profiling.md).
+
+For now, Java and Python agents support this.
+
+## Out-of-process profiling
+
+Out-of-process profiling leverage [eBPF](https://ebpf.io/) technology with origins in the Linux kernel.
+It provides a way to extend the capabilities of the kernel safely and efficiently.
+
+### On-CPU Profiling
+
+On-CPU profiling is suitable for analyzing thread stacks when service CPU usage is high.  
+If the stack is dumped more times, it means that the thread stack occupies more CPU resources.
+
+This is pretty similar with in-process profiling to resolve the issue <1>, but it is made out-of-process and based on
+Linux eBPF.
+Meanwhile, this is made for languages without VM mechanism, which caused not supported by in-process agents, such as,
+C/C++, Rust. Golang is a special case, it exposed the metadata of the VM for eBPF, so, it could be profiled.
+
+### Off-CPU Profiling
+
+Off-CPU profiling is suitable for performance issues that are not caused by high CPU usage, but may be on high CPU load.
+This profiling aims to resolve the issue <2>.
+
+For example,
+
+1. When there are too many threads in one service, using off-CPU profiling could reveal which threads spend
+   more time context switching.
+2. Codes heavily rely on disk I/O or remote service performance would slow down the whole process.
+
+Off-CPU profiling provides two perspectives
+
+1. Thread switch count: The number of times a thread switches context. When the thread returns to the CPU, it completes
+   one context switch. A thread stack with a higher switch count spends more time context switching.
+2. Thread switch duration: The time it takes a thread to switch the context. A thread stack with a higher switch
+   duration spends more time off-CPU.
+
+Learn more tech details about ON/OFF CPU profiling from the post, [**Pinpoint Service Mesh Critical Performance Impact
+by using eBPF**](ebpf-cpu-profiling.md)
+
+### Network Profiling
+
+Network profiling captures the network packages to analysis traffic at L4(TCP) and L7(HTTP) to recognize network traffic
+from a specific process or a k8s pod. Through this traffic analysis, locate the root causes of the issues <3> and <4>.
+
+Network profiling provides
+
+1. Network topology and identify processes.
+2. Observe TCP traffic metrics with TLS status.
+3. Observe HTTP traffic metrics.
+4. Sample HTTP request/response raw data within tracing context.
+5. Observe time costs for local I/O costing on the OS. Such as the time of Linux process HTTP request/response.
+
+Learn more tech details from the post, [**Diagnose Service Mesh Network Performance with
+eBPF**](../academy/diagnose-service-mesh-network-performance-with-ebpf.md)
\ No newline at end of file
diff --git a/docs/menu.yml b/docs/menu.yml
index 2f85c091c4..170d98787d 100644
--- a/docs/menu.yml
+++ b/docs/menu.yml
@@ -34,16 +34,18 @@ catalog:
             path: "/en/concepts-and-designs/service-agent"
           - name: "Manual Instrument SDK"
             path: "/en/concepts-and-designs/manual-sdk"
-      - name: "Backend"
+      - name: "Observability Analysis Platform"
         catalog:
           - name: "Overview"
             path: "/en/concepts-and-designs/backend-overview"
-          - name: "Observability Analysis Language"
+          - name: "Analysis Streaming Traces and Mesh Traffic"
             path: "/en/concepts-and-designs/oal"
-          - name: "Meter Analysis Language"
+          - name: "Analysis Metrics and Meters"
             path: "/en/concepts-and-designs/mal"
-          - name: "Log Analysis Language"
+          - name: "Analysis Logs"
             path: "/en/concepts-and-designs/lal"
+          - name: "Profiling"
+            path: "/en/setup/backend/profiling"
           - name: "Query in OAP"
             path: "/en/protocols/readme#query-protocol"
       - name: "Event"