You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by wu...@apache.org on 2019/01/25 15:10:22 UTC

[incubator-skywalking-website] branch master updated: Mesh performance test blog (#30)

This is an automated email from the ASF dual-hosted git repository.

wusheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-skywalking-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 96d7dee  Mesh performance test blog (#30)
96d7dee is described below

commit 96d7dee1cdb19d0bae5fb1319f415a9e50b057d7
Author: Gao Hongtao <ha...@gmail.com>
AuthorDate: Fri Jan 25 23:10:18 2019 +0800

    Mesh performance test blog (#30)
    
    * add mesh performance test blog
    
    * fix some issues
---
 .../blog/2019-01-25-mesh-loadtest/image1.png       | Bin 0 -> 39044 bytes
 .../blog/2019-01-25-mesh-loadtest/image2.png       | Bin 0 -> 6095 bytes
 .../blog/2019-01-25-mesh-loadtest/image3.png       | Bin 0 -> 317634 bytes
 .../blog/2019-01-25-mesh-loadtest/image4.png       | Bin 0 -> 308347 bytes
 .../blog/2019-01-25-mesh-loadtest/image5.png       | Bin 0 -> 248460 bytes
 .../blog/2019-01-25-mesh-loadtest/image6.png       | Bin 0 -> 51443 bytes
 docs/blog/2019-01-25-mesh-loadtest.md              |  80 +++++++++++++++++++++
 docs/blog/README.md                                |   5 ++
 8 files changed, 85 insertions(+)

diff --git a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image1.png b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image1.png
new file mode 100755
index 0000000..2b11cd4
Binary files /dev/null and b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image1.png differ
diff --git a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image2.png b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image2.png
new file mode 100755
index 0000000..f666067
Binary files /dev/null and b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image2.png differ
diff --git a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image3.png b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image3.png
new file mode 100755
index 0000000..dcdde59
Binary files /dev/null and b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image3.png differ
diff --git a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image4.png b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image4.png
new file mode 100755
index 0000000..721bc6e
Binary files /dev/null and b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image4.png differ
diff --git a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image5.png b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image5.png
new file mode 100755
index 0000000..99d0786
Binary files /dev/null and b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image5.png differ
diff --git a/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image6.png b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image6.png
new file mode 100755
index 0000000..24a3ea2
Binary files /dev/null and b/docs/.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image6.png differ
diff --git a/docs/blog/2019-01-25-mesh-loadtest.md b/docs/blog/2019-01-25-mesh-loadtest.md
new file mode 100644
index 0000000..cb840b2
--- /dev/null
+++ b/docs/blog/2019-01-25-mesh-loadtest.md
@@ -0,0 +1,80 @@
+# SkyWalking performance in Service Mesh scenario
+
+- Auther: Hongtao Gao, Apache SkyWalking & ShardingShpere PMC
+- [GitHub](https://github.com/hanahmily), [Twitter](https://twitter.com/hanahmily), [Linkedin](https://www.linkedin.com/in/gao-hongtao-47b835168/)
+
+Jan. 25th, 2019
+
+Service mesh receiver was first introduced in Apache SkyWalking 6.0.0-beta. It is designed to provide a common entrance for receiving telemetry data from service mesh framework, for instance, Istio, Linkerd, Envoy etc. What’s the service mesh? According to Istio’s explain:
+
+The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them.
+
+As a PMC member of Apache SkyWalking, I tested trace receiver and well understood the performance of collectors in trace scenario. I also would like to figure out the performance of service mesh receiver.
+
+## Different between trace and service mesh
+
+Following chart presents a typical trace map:
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image5.png)
+
+You could find a variety of elements in it just like web service, local method, database, cache, MQ and so on. But service mesh only collect service network telemetry data that contains the entrance and exit data of a service for now(more elements will be imported soon, just like Database). A smaller quantity of data is sent to the service mesh receiver than the trace.
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image1.png)
+
+But using sidecar is a little different.The client requesting “A” that will send a segment to service mesh receiver from “A”’s sidecar. If “A” depends on “B”,  another segment will be sent from “A”’s sidecar. But for a trace system, only one segment is received by the collector. The sidecar model splits one segment into small segments, that will increase service mesh receiver network overhead.
+
+## Deployment Architecture
+
+In this test, I will pick two different backend deployment. One is called mini unit, consist of one collector and one elasticsearch instance. Another is a standard production cluster, contains three collectors and three elasticsearch instances.
+
+Mini unit is a suitable architecture for dev or test environment. It saves your time and VM resources, speeds up depolyment process.
+
+The standard cluster provides good performance and HA for a production scenario. Though you will pay more money and take care of the cluster carefully, the reliability of the cluster will be a good reward to you.
+
+I pick 8 CPU and 16GB VM to set up the test environment. This test targets the performance of normal usage scenarios, so that choice is reasonable. The cluster is built on Google Kubernetes Engine(GKE), and every node links each other with a VPC network. For running collector is a CPU intensive task, the resource request of collector deployment should be 8 CPU, which means every collector instance occupy a VM node. 
+
+## Testing Process
+
+Receiving mesh fragments per second(MPS) depends on the following variables.
+
+ 1. Ingress query per second(QPS)
+ 1. The topology of a microservice cluster
+ 1. Service mesh mode(proxy or sidecar)
+
+In this test, I use Bookinfo app as a demo cluster.
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image6.png)
+
+So every request will touch max 4 nodes. Plus picking the sidecar mode(every request will send two telemetry data),  the MPS will be QPS * 4 *2. 
+
+There are also some important metrics that should be explained
+
+ * Client Query Latency: GraphQL API query response time heatmap.
+ * Client Mesh Sender: Send mesh segments per second. The total line represents total send amount and the error line is the total number of failed send.
+ * Mesh telemetry latency: service mesh receiver handling data heatmap.
+ * Mesh telemetry received: received mesh telemetry data per second.
+
+### Mini Unit
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image3.png)
+
+You could find collector can process up to **25k** data per second. The CPU usage is about 4 cores. Most of the query latency is less than 50ms. After login the VM on which collector instance running, I know that system load is reaching the limit(max is 8).
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image2.png)
+
+According to the previous formula, a single collector instance could process **3k** QPS of Bookinfo traffic.
+
+### Standard Cluster
+
+![](../.vuepress/public/static/blog/2019-01-25-mesh-loadtest/image4.png)
+
+Compare to the mini-unit, cluster’s throughput increases linearly. Three instances provide total 80k per second processing power. Query latency increases slightly, but it’s also very small(less than 500ms). I also checked every collector instance system load that all reached the limit. 10k QPS of BookInfo telemetry data could be processed by the cluster.
+
+## Conclusion
+
+Let’s wrap them up. There are some important things you could get from this test.
+ * QPS varies by the there variables. The test results in this blog are not important. The user should pick property value according to his system.
+ * Collector cluster’s processing power could scale out.
+ * The collector is CPU intensive application. So you should provide sufficient CPU resource to it.
+
+This blog gives people a common method to evaluate the throughput of Service Mesh Receiver. Users could use this to design their Apache Skywalking backend deployment architecture.
diff --git a/docs/blog/README.md b/docs/blog/README.md
index 960dae9..d2d6f9e 100755
--- a/docs/blog/README.md
+++ b/docs/blog/README.md
@@ -3,6 +3,11 @@ layout: LayoutBlog
 
 blog:
 
+- title: SkyWalking performance in Service Mesh scenario
+  name: 2019-01-25-mesh-loadtest
+  time: Hongtao Gao 25th, 2019
+  short: Service mesh receiver performance test on Google Kubernetes Engine.
+
 - title: Understand distributed trace easier in the incoming 6-GA
   name: 2019-01-01-Understand-Trace
   time: Sheng Wu. Jan. 1st, 2019