You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uniffle.apache.org by ro...@apache.org on 2023/05/28 04:43:47 UTC

[incubator-uniffle] branch master updated: [MINOR] docs: Add benchmark results (#904)

This is an automated email from the ASF dual-hosted git repository.

roryqi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git


The following commit(s) were added to refs/heads/master by this push:
     new 7ba062dc [MINOR] docs: Add benchmark results (#904)
7ba062dc is described below

commit 7ba062dce0802bef8a94d9b89089ef9ad98a5e90
Author: roryqi <ro...@apache.org>
AuthorDate: Sun May 28 12:43:42 2023 +0800

    [MINOR] docs: Add benchmark results (#904)
    
    ### What changes were proposed in this pull request?
    Add the benchmark data.
    
    ### Why are the changes needed?
    Attract more users. If we have benchmark data, users don't need to test by themselves.  It will lower the barrier to use.
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Just doc
---
 README.md                         |   3 +
 docs/asset/rss_benchmark1.png     | Bin 0 -> 604169 bytes
 docs/asset/rss_benchmark2.png     | Bin 0 -> 234994 bytes
 docs/asset/rss_benchmark3.png     | Bin 0 -> 306675 bytes
 docs/asset/vanilla_benchmark1.png | Bin 0 -> 310075 bytes
 docs/asset/vanilla_benchmark2.png | Bin 0 -> 622176 bytes
 docs/asset/vanilla_benchmark3.png | Bin 0 -> 757806 bytes
 docs/benchmark.md                 | 176 ++++++++++++++++++++++++++++++++++++++
 8 files changed, 179 insertions(+)

diff --git a/README.md b/README.md
index 0b4edecd..308cf1a6 100644
--- a/README.md
+++ b/README.md
@@ -275,6 +275,9 @@ The following security configurations are introduced.
   be as the superuser for HDFS. For more details of related sections, 
   please see [Proxy user - Superusers Acting On Behalf Of Other Users](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Superusers.html)
 
+## Benchmark
+We provide some benchmark tests for Uniffle. For details, you can see [Uniffle Benchmark](docs/benchmark.md)
+
 ## LICENSE
 
 Uniffle is under the Apache License Version 2.0. See the [LICENSE](https://github.com/apache/incubator-uniffle/blob/master/LICENSE) file for details.
diff --git a/docs/asset/rss_benchmark1.png b/docs/asset/rss_benchmark1.png
new file mode 100644
index 00000000..04f116f1
Binary files /dev/null and b/docs/asset/rss_benchmark1.png differ
diff --git a/docs/asset/rss_benchmark2.png b/docs/asset/rss_benchmark2.png
new file mode 100644
index 00000000..c89eb8ad
Binary files /dev/null and b/docs/asset/rss_benchmark2.png differ
diff --git a/docs/asset/rss_benchmark3.png b/docs/asset/rss_benchmark3.png
new file mode 100644
index 00000000..f83f3740
Binary files /dev/null and b/docs/asset/rss_benchmark3.png differ
diff --git a/docs/asset/vanilla_benchmark1.png b/docs/asset/vanilla_benchmark1.png
new file mode 100644
index 00000000..05821608
Binary files /dev/null and b/docs/asset/vanilla_benchmark1.png differ
diff --git a/docs/asset/vanilla_benchmark2.png b/docs/asset/vanilla_benchmark2.png
new file mode 100644
index 00000000..38c21bae
Binary files /dev/null and b/docs/asset/vanilla_benchmark2.png differ
diff --git a/docs/asset/vanilla_benchmark3.png b/docs/asset/vanilla_benchmark3.png
new file mode 100644
index 00000000..a52994b6
Binary files /dev/null and b/docs/asset/vanilla_benchmark3.png differ
diff --git a/docs/benchmark.md b/docs/benchmark.md
new file mode 100644
index 00000000..d5c518bd
--- /dev/null
+++ b/docs/benchmark.md
@@ -0,0 +1,176 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+  ## Environment
+  
+  Software: Uniffle 0.2.0  Hadoop 2.8.5 Spark 2.4.6
+  
+  Hardware: Machine 176 cores, 265G memory, 4T * 12 HDD, network bandwidth 10GB/s
+  
+  Hadoop Yarn Cluster: 1 * ResourceManager + 6 * NodeManager, every machine 4T * 10 HDD
+  
+  Uniffle Cluster: 1 * Coordinator + 6 * Shuffle Server, every machine 4T * 10 HDD
+  
+  ## Configuration
+  Spark's configuration
+  ````
+  spark.executor.instances 100
+  spark.executor.cores 4
+  spark.executor.memory 9g
+  spark.executor.memoryOverhead 1024
+  spark.shuffle.manager org.apache.spark.shuffle.RssShuffleManager
+  spark.rss.storage.type MEMORY_LOCALFILE
+  ````
+  Shuffle Server's configuration
+  ````
+  rss.storage.type MEMORY_LOCALFILE
+  rss.server.buffer.capacity 50g
+  ````
+  
+  ## TPC-DS
+  We used [spark-sql-perf](https://github.com/databricks/spark-sql-perf) to generate 1TB data.
+  
+  |query name|vanilla|uniffle|
+  |---|---|---|
+  |query1|16|18|
+  |query10|30|35|
+  |query11|86|96|
+  |query12|14|17|
+  |query13|102|77|
+  |query14a|239|254|
+  |query14b|226|232|
+  |query15|44|48|
+  |query16|50|59|
+  |query17|83|97|
+  |query18|31|35|
+  |query19|15|17|
+  |query2|21|25|
+  |query20|15|16|
+  |query21|8|8|
+  |query22|21|22|
+  |query23a|288|366|
+  |query23b|366|422|
+  |query24a|181|198|
+  |query24b|167|187|
+  |query25|93|113|
+  |query26|15|15|
+  |query27|16|17|
+  |query28|38|41|
+  |query29|80|102|
+  |query3|9|11|
+  |query30|21|26|
+  |query31|30|40|
+  |query32|14|15|
+  |query33|26|30|
+  |query34|12|16|
+  |query35|34|39|
+  |query36|15|18|
+  |query37|16|20|
+  |query38|27|36|
+  |query39|15|19|
+  |query39a|16|20|
+  |query39b|14|19|
+  |query4|205|227|
+  |query40|38|38|
+  |query41|5|6|
+  |query42|9|10|
+  |query43|13|13|
+  |query44|20|22|
+  |query45|30|36|
+  |query46|16|18|
+  |query47|22|25|
+  |query48|25|24|
+  |query49|58|66|
+  |query5|56|59|
+  |query50|56|61|
+  |query51|23|28|
+  |query52|9|10|
+  |query53|12|13|
+  |query54|52|62|
+  |query55|9|10|
+  |query56|25|27|
+  |query57|20|22|
+  |query58|23|26|
+  |query59|22|22|
+  |query6|33|41|
+  |query60|25|28|
+  |query61|25|28|
+  |query62|10|11|
+  |query63|12|12|
+  |query64|176|185|
+  |query65|32|37|
+  |query66|23|24|
+  |query67|697|775|
+  |query68|17|19|
+  |query69|31|34|
+  |query7|17|17|
+  |query70|24|27|
+  |query71|23|24|
+  |query72|335|350|
+  |query73|12|14|
+  |query74|68|99|
+  |query75|58|67|
+  |query76|21|21|
+  |query77|35|37|
+  |query78|151|169|
+  |query79|16|16|
+  |query8|15|20|
+  |query80|146|163|
+  |query81|18|26|
+  |query82|28|31|
+  |query83|21|24|
+  |query84|16|19|
+  |query85|45|49|
+  |query86|14|17|
+  |query87|29|37|
+  |query88|29|29|
+  |query89|11|13|
+  |query9|37|37|
+  |query90|11|11|
+  |query91|17|21|
+  |query92|12|12|
+  |query93|86|86|
+  |query94|40|42|
+  |query95|94|94|
+  |query96|10|10|
+  |query97|29|34|
+  |query98|17|21|
+  |query99|13|12|
+  |total|5821|6494|
+  
+  Uniffle is a little 9% slower than vanilla Spark. Because the amount of shuffle is tiny.
+  
+  ## Tera Sort
+  We generate 1TB data, we use the code of the [repo](https://github.com/ehiggs/spark-terasort)
+  #### Uniffle performance
+  Overall Time:
+  ![Overall Time](asset/rss_benchmark3.png)
+  Write Time:
+  ![Write Time](asset/rss_benchmark2.png)
+  Read Time:
+  ![Read Time](asset/rss_benchmark1.png)
+  #### vanilla Spark performance
+  Overall Time:
+  ![Overall Time](asset/vanilla_benchmark1.png)
+  Write Time:
+  ![Write Time](asset/vanilla_benchmark2.png)
+  Read Time:
+  ![Read Time](asset/vanilla_benchmark3.png)
+  Uniffle is 30%+ much faster than vanilla Spark when there is a large shuffle.
+  
+  
+  
+  
\ No newline at end of file