You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Huang, Jie" <ji...@intel.com> on 2015/06/16 19:27:18 UTC

[SparkScore] Performance portal for Apache Spark

Hi All

We are happy to announce Performance portal for Apache Spark http://01org.github.io/sparkscore/ !
The Performance Portal for Apache Spark provides performance data on the Spark upsteam to the community to help identify issues, better understand performance differentials between versions, and help Spark customers get across the finish line faster. The Performance Portal generates two reports, regular (weekly) report and release based regression test report. We are currently using two benchmark suites which include HiBench (http://github.com/intel-bigdata/HiBench) and Spark-perf (https://github.com/databricks/spark-perf ). We welcome and look forward to your suggestions and feedbacks. More information and details provided below
Abount Performance Portal for Apache Spark
Our goal is to work with the Apache Spark community to further enhance the scalability and reliability of the Apache Spark. The data available on this site allows community members and potential Spark customers to closely track performance trend of the Apache Spark. Ultimately, we hope that this project will help community to fix performance issue quickly, thus providing better Apache spark code to end customers. The current workloads used in the benchmarking include HiBench (a benchmark suite to evaluate big data framework like Hadoop MR, Spark from Intel) and Spark-perf (a performance testing framework for Apache Spark from Databricks). Additional benchmarks will be added as they become available
Description
________________________________
Each data point represents each workload runtime percent compared with the previous week. Different lines represents different workloads running on spark yarn-client mode.
Hardware
________________________________
CPU type: Intel(r) Xeon(r) CPU E5-2697 v2 @ 2.70GHz
Memory: 128GB
NIC: 10GbE
Disk(s): 8 x 1TB SATA HDD
Software
________________________________
JAVA ver sion: 1.8.0_25
Hadoop version: 2.5.0-CDH5.3.2
HiBench version: 4.0
Spark on yarn-client mode
Cluster
________________________________
1 node for Master
10 nodes for Slave
Summary
The lower percent the better performance.
________________________________
Group

ww19

ww20

ww22

ww23

ww24

ww25

HiBench

9.1%

6.6%

6.0%

7.9%

-6.5%

-3.1%

spark-perf

4.1%

4.4%

-1.8%

4.1%

-4.7%

-4.6%


[cid:overall_workloads20150616.png]
Y-Axis: normalized completion time; X-Axis: Work Week.
The commit number can be found in the result table.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.

HiBench
________________________________
JOB

ww19

ww20

ww22

ww23

ww24

ww25

commit

489700c8

8e3822a0

530efe3e

90c60692

db81b9d8

4eb48ed1

sleep

%

%

-2.1%

-2.9%

-4.1%

12.8%

wordcount

17.6%

11.4%

8.0%

8.3%

-18.6%

-10.9%

kmeans

92.1%

61.5%

72.1%

92.9%

86.9%

95.8%

scan

-4.9%

-7.2%

%

-1.1%

-25.5%

-21.0%

bayes

-24.3%

-20.1%

-18.3%

-11.1%

-29.7%

-31.3%

aggregation

5.6%

10.5%

%

9.2%

-15.3%

-15.0%

join

4.5%

1.2%

%

1.0%

-12.7%

-13.9%

sort

-3.3%

-0.5%

-11.9%

-12.5%

-17.5%

-17.3%

pagerank

2.2%

3.2%

4.0%

2.9%

-11.4%

-13.0%

terasort

-7.1%

-0.2%

-9.5%

-7.3%

-16.7%

-17.0%


Comments: null means no such workload running or workload failed in this time.
[cid:HiBench_workloads20150616.png]
Y-Axis: normalized completion time; X-Axis: Work Week.
The commit number can be found in the result table.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.
spark-perf
________________________________
JOB

ww19

ww20

ww22

ww23

ww24

ww25

commit

489700c8

8e3822a0

530efe3e

90c60692

db81b9d8

4eb48ed1

agg

13.2%

7.0%

%

18.3%

5.2%

2.5%

agg-int

16.4%

21.2%

%

9.6%

4.0%

8.2%

agg-naive

4.3%

-2.4%

%

-0.8%

-6.7%

-6.8 %

scheduling

-6.1%

-8.9%

-14.5%

-2.1%

-6.4%

-6.5%

count-filter

4.1%

1.0%

6.6%

6.8%

-10.2%

-10.4%

count

4.8%

4.6%

6.7%

8.0%

-7.3%

-7.0%

sort

-8.1%

-2.5%

-6.2%

-7.0%

-14.6%

-14.4%

sort-int

4.5%

15.3%

-1.6%

-0.1%

-1.5%

-2.2%


Comments: null means no such workload running or workload failed in this time.
[cid:sparkperf_workloads20150616.png]
Y-Axis: normalized completion time; X-Axis: Work Week.
The commit number can be found in the result table.
The pe rformance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.
Release
Summary
The lower percent the better performance.
________________________________
Group

1.2.1

1.3.0

1.3.1

1.4.0

HiBench

-1.0%

10.5%

8.4%

8.6%

spark-perf

3.2%

0.9%

1.9%

1.3%


[cid:overall_release20150616.png]
Y-Axis: normalized completion time; X-Axis: Release.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.

HiBench
________________________________
JOB

1.2.1

1.3.0

1.3.1

1.4.0

sleep

%

%

%

-0.5%

wordcount

3.5%

5.4%

5.1%

8.7%

kmeans

6.0%

72.6%

82.7%

100.7%

scan

-0.7%

-3.2%

-1.9%

-4.4%

bayes

-19.7%

7.7%

-24.5%

-14.4%

aggregation

4.6%

7.1%

9.9%

9.3%

join

0.7%

4.0%

8.6%

1.3%

sort

-1.0%

2.1%

-1.8%

-10.4%

pagerank

1.5 %

2.2%

1.3%

5.4%

terasort

-3.7%

-3.3%

-3.7%

-9.5%


Comments: null means no such workload running or workload failed in this time.
[cid:HiBench_release20150616.png]
Y-Axis: normalized completion time; X-Axis: Release.
The commit number can be found in the result table.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.
spark-perf
________________________________
JOB

1.2.1

1.3.0

1.3.1

1.4.0

agg

1.9%

3.1%

6.2%

5.0%

agg-int

6.4%

17.1%

18.0%

24.2%

agg-naive

-2.6%

-3.2%

-1.8%

-5.2%

scheduling

8.2%

-16.8%

-14.4%

-19.1%

count-filter

-0.4%

0.3%

-0.5%

0.4%

count

0.6%

-0.3%

0.4%

0.9%

sort

1.2%

-3.3%

-5.3%

-1.9%

sort-int

10.1%

10.0%

12.3%

6.0%


Comments: null means no such workload running or workload failed in this time.
[cid:sparkperf_release20150616.png]
Y-Axis: normalized completion time; X-Axis: Release.
The commit number can be found in the result table.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.
________________________________
Copyright (c) 2015 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Project Email: sparkscore@lists.01.org<ma...@lists.01.org> Please subscribe to the list at: https://lists.01.org/mailman/listinfo/sparkscore

RE: [SparkScore] Performance portal for Apache Spark

Posted by "Duan, Jiangang" <ji...@intel.com>.

We are looking for more workloads – if you guys have any suggestions, let us know.

-jiangang

From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Wednesday, June 17, 2015 5:51 PM
To: Huang, Jie
Cc: user@spark.apache.org; dev@spark.apache.org
Subject: Re: [SparkScore] Performance portal for Apache Spark

This looks really awesome.

On Tue, Jun 16, 2015 at 10:27 AM, Huang, Jie <ji...@intel.com>> wrote:
Hi All

We are happy to announce Performance portal for Apache Spark http://01org.github.io/sparkscore/ !
The Performance Portal for Apache Spark provides performance data on the Spark upsteam to the community to help identify issues, better understand performance differentials between versions, and help Spark customers get across the finish line faster. The Performance Portal generates two reports, regular (weekly) report and release based regression test report. We are currently using two benchmark suites which include HiBench (http://github.com/intel-bigdata/HiBench) and Spark-perf (https://github.com/databricks/spark-perf ). We welcome and look forward to your suggestions and feedbacks. More information and details provided below
Abount Performance Portal for Apache Spark
Our goal is to work with the Apache Spark community to further enhance the scalability and reliability of the Apache Spark. The data available on this site allows community members and potential Spark customers to closely track performance trend of the Apache Spark. Ultimately, we hope that this project will help community to fix performance issue quickly, thus providing better Apache spark code to end customers. The current workloads used in the benchmarking include HiBench (a benchmark suite to evaluate big data framework like Hadoop MR, Spark from Intel) and Spark-perf (a performance testing framework for Apache Spark from Databricks). Additional benchmarks will be added as they become available
Description
________________________________
Each data point represents each workload runtime percent compared with the previous week. Different lines represents different workloads running on spark yarn-client mode.
Hardware
________________________________
CPU type: Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz
Memory: 128GB
NIC: 10GbE
Disk(s): 8 x 1TB SATA HDD
Software
________________________________
JAVA ver sion: 1.8.0_25
Hadoop version: 2.5.0-CDH5.3.2
HiBench version: 4.0
Spark on yarn-client mode
Cluster
________________________________
1 node for Master
10 nodes for Slave
Summary
The lower percent the better performance.
________________________________
Group

ww19

ww20

ww22

ww23

ww24

ww25

HiBench

9.1%

6.6%

6.0%

7.9%

-6.5%

-3.1%

spark-perf

4.1%

4.4%

-1.8%

4.1%

-4.7%

-4.6%

Y-Axis: normalized completion time; X-Axis: Work Week.
The commit number can be found in the result table.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.

HiBench
________________________________
JOB

ww19

ww20

ww22

ww23

ww24

ww25

commit

489700c8

8e3822a0

530efe3e

90c60692

db81b9d8

4eb48ed1

sleep

%

%

-2.1%

-2.9%

-4.1%

12.8%

wordcount

17.6%

11.4%

8.0%

8.3%

-18.6%

-10.9%

kmeans

92.1%

61.5%

72.1%

92.9%

86.9%

95.8%

scan

-4.9%

-7.2%

%

-1.1%

-25.5%

-21.0%

bayes

-24.3%

-20.1%

-18.3%

-11.1%

-29.7%

-31.3%

aggregation

5.6%

10.5%

%

9.2%

-15.3%

-15.0%

join

4.5%

1.2%

%

1.0%

-12.7%

-13.9%

sort

-3.3%

-0.5%

-11.9%

-12.5%

-17.5%

-17.3%

pagerank

2.2%

3.2%

4.0%

2.9%

-11.4%

-13.0%

terasort

-7.1%

-0.2%

-9.5%

-7.3%

-16.7%

-17.0%

Comments: null means no such workload running or workload failed in this time.

Y-Axis: normalized completion time; X-Axis: Work Week.
The commit number can be found in the result table.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.
spark-perf
________________________________
JOB

ww19

ww20

ww22

ww23

ww24

ww25

commit

489700c8

8e3822a0

530efe3e

90c60692

db81b9d8

4eb48ed1

agg

13.2%

7.0%

%

18.3%

5.2%

2.5%

agg-int

16.4%

21.2%

%

9.6%

4.0%

8.2%

agg-naive

4.3%

-2.4%

%

-0.8%

-6.7%

-6.8 %

scheduling

-6.1%

-8.9%

-14.5%

-2.1%

-6.4%

-6.5%

count-filter

4.1%

1.0%

6.6%

6.8%

-10.2%

-10.4%

count

4.8%

4.6%

6.7%

8.0%

-7.3%

-7.0%

sort

-8.1%

-2.5%

-6.2%

-7.0%

-14.6%

-14.4%

sort-int

4.5%

15.3%

-1.6%

-0.1%

-1.5%

-2.2%

Comments: null means no such workload running or workload failed in this time.

Y-Axis: normalized completion time; X-Axis: Work Week.
The commit number can be found in the result table.
The pe rformance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.
Release
Summary
The lower percent the better performance.
________________________________
Group

1.2.1

1.3.0

1.3.1

1.4.0

HiBench

-1.0%

10.5%

8.4%

8.6%

spark-perf

3.2%

0.9%

1.9%

1.3%

Y-Axis: normalized completion time; X-Axis: Release.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.

HiBench
________________________________
JOB

1.2.1

1.3.0

1.3.1

1.4.0

sleep

%

%

%

-0.5%

wordcount

3.5%

5.4%

5.1%

8.7%

kmeans

6.0%

72.6%

82.7%

100.7%

scan

-0.7%

-3.2%

-1.9%

-4.4%

bayes

-19.7%

7.7%

-24.5%

-14.4%

aggregation

4.6%

7.1%

9.9%

9.3%

join

0.7%

4.0%

8.6%

1.3%

sort

-1.0%

2.1%

-1.8%

-10.4%

pagerank

1.5 %

2.2%

1.3%

5.4%

terasort

-3.7%

-3.3%

-3.7%

-9.5%

Comments: null means no such workload running or workload failed in this time.

Y-Axis: normalized completion time; X-Axis: Release.
The commit number can be found in the result table.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.
spark-perf
________________________________
JOB

1.2.1

1.3.0

1.3.1

1.4.0

agg

1.9%

3.1%

6.2%

5.0%

agg-int

6.4%

17.1%

18.0%

24.2%

agg-naive

-2.6%

-3.2%

-1.8%

-5.2%

scheduling

8.2%

-16.8%

-14.4%

-19.1%

count-filter

-0.4%

0.3%

-0.5%

0.4%

count

0.6%

-0.3%

0.4%

0.9%

sort

1.2%

-3.3%

-5.3%

-1.9%

sort-int

10.1%

10.0%

12.3%

6.0%

Comments: null means no such workload running or workload failed in this time.

Y-Axis: normalized completion time; X-Axis: Release.
The commit number can be found in the result table.
The performance score for each workload is normalized based on the elapsed time for 1.2 release.The lower the better.
________________________________
Copyright © 2015 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Project Email: sparkscore@lists.01.org<ma...@lists.01.org> Please subscribe to the list at: https://lists.01.org/mailman/listinfo/sparkscore

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>

Re: [SparkScore] Performance portal for Apache Spark

Posted by Sandy Ryza <sa...@cloudera.com>.

This looks really awesome.

On Tue, Jun 16, 2015 at 10:27 AM, Huang, Jie <ji...@intel.com> wrote:

>  Hi All
>
> We are happy to announce Performance portal for Apache Spark
> http://01org.github.io/sparkscore/ !
>
> The Performance Portal for Apache Spark provides performance data on the
> Spark upsteam to the community to help identify issues, better understand
> performance differentials between versions, and help Spark customers get
> across the finish line faster. The Performance Portal generates two
> reports, regular (weekly) report and release based regression test report.
> We are currently using two benchmark suites which include HiBench (
> http://github.com/intel-bigdata/HiBench) and Spark-perf (
> https://github.com/databricks/spark-perf ). We welcome and look forward
> to your suggestions and feedbacks. More information and details provided
> below
> Abount Performance Portal for Apache Spark
>
> Our goal is to work with the Apache Spark community to further enhance the
> scalability and reliability of the Apache Spark. The data available on this
> site allows community members and potential Spark customers to closely
> track performance trend of the Apache Spark. Ultimately, we hope that this
> project will help community to fix performance issue quickly, thus
> providing better Apache spark code to end customers. The current workloads
> used in the benchmarking include HiBench (a benchmark suite to evaluate big
> data framework like Hadoop MR, Spark from Intel) and Spark-perf (a
> performance testing framework for Apache Spark from Databricks). Additional
> benchmarks will be added as they become available
> Description
> ------------------------------
>
> Each data point represents each workload runtime percent compared with the
> previous week. Different lines represents different workloads running on
> spark yarn-client mode.
> Hardware
> ------------------------------
>
> CPU type: Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz
> Memory: 128GB
> NIC: 10GbE
> Disk(s): 8 x 1TB SATA HDD
> Software
> ------------------------------
>
> JAVA ver sion: 1.8.0_25
> Hadoop version: 2.5.0-CDH5.3.2
> HiBench version: 4.0
> Spark on yarn-client mode
> Cluster
> ------------------------------
>
> 1 node for Master
> 10 nodes for Slave
> Summary
>
> The lower percent the better performance.
>  ------------------------------
>
> *Group*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> HiBench
>
> 9.1%
>
> 6.6%
>
> 6.0%
>
> 7.9%
>
> -6.5%
>
> -3.1%
>
> spark-perf
>
> 4.1%
>
> 4.4%
>
> -1.8%
>
> 4.1%
>
> -4.7%
>
> -4.6%
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> HiBench
> ------------------------------
>
> *JOB*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> *commit*
>
> *489700c8 *
>
> *8e3822a0 *
>
> *530efe3e *
>
> *90c60692 *
>
> *db81b9d8 *
>
> *4eb48ed1 *
>
> sleep
>
> %
>
> %
>
> -2.1%
>
> -2.9%
>
> -4.1%
>
> 12.8%
>
> wordcount
>
> 17.6%
>
> 11.4%
>
> 8.0%
>
> 8.3%
>
> -18.6%
>
> -10.9%
>
> kmeans
>
> 92.1%
>
> 61.5%
>
> 72.1%
>
> 92.9%
>
> 86.9%
>
> 95.8%
>
> scan
>
> -4.9%
>
> -7.2%
>
> %
>
> -1.1%
>
> -25.5%
>
> -21.0%
>
> bayes
>
> -24.3%
>
> -20.1%
>
> -18.3%
>
> -11.1%
>
> -29.7%
>
> -31.3%
>
> aggregation
>
> 5.6%
>
> 10.5%
>
> %
>
> 9.2%
>
> -15.3%
>
> -15.0%
>
> join
>
> 4.5%
>
> 1.2%
>
> %
>
> 1.0%
>
> -12.7%
>
> -13.9%
>
> sort
>
> -3.3%
>
> -0.5%
>
> -11.9%
>
> -12.5%
>
> -17.5%
>
> -17.3%
>
> pagerank
>
> 2.2%
>
> 3.2%
>
> 4.0%
>
> 2.9%
>
> -11.4%
>
> -13.0%
>
> terasort
>
> -7.1%
>
> -0.2%
>
> -9.5%
>
> -7.3%
>
> -16.7%
>
> -17.0%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> spark-perf
> ------------------------------
>
> *JOB*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> *commit*
>
> *489700c8 *
>
> *8e3822a0 *
>
> *530efe3e *
>
> *90c60692 *
>
> *db81b9d8 *
>
> *4eb48ed1 *
>
> agg
>
> 13.2%
>
> 7.0%
>
> %
>
> 18.3%
>
> 5.2%
>
> 2.5%
>
> agg-int
>
> 16.4%
>
> 21.2%
>
> %
>
> 9.6%
>
> 4.0%
>
> 8.2%
>
> agg-naive
>
> 4.3%
>
> -2.4%
>
> %
>
> -0.8%
>
> -6.7%
>
> -6.8 %
>
> scheduling
>
> -6.1%
>
> -8.9%
>
> -14.5%
>
> -2.1%
>
> -6.4%
>
> -6.5%
>
> count-filter
>
> 4.1%
>
> 1.0%
>
> 6.6%
>
> 6.8%
>
> -10.2%
>
> -10.4%
>
> count
>
> 4.8%
>
> 4.6%
>
> 6.7%
>
> 8.0%
>
> -7.3%
>
> -7.0%
>
> sort
>
> -8.1%
>
> -2.5%
>
> -6.2%
>
> -7.0%
>
> -14.6%
>
> -14.4%
>
> sort-int
>
> 4.5%
>
> 15.3%
>
> -1.6%
>
> -0.1%
>
> -1.5%
>
> -2.2%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The pe rformance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
>  Release
> Summary
>
> The lower percent the better performance.
>  ------------------------------
>
> *Group*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> HiBench
>
> -1.0%
>
> 10.5%
>
> 8.4%
>
> 8.6%
>
> spark-perf
>
> 3.2%
>
> 0.9%
>
> 1.9%
>
> 1.3%
>
>
> *Y-Axis: normalized completion time; X-Axis: Release.*
> * The performance score for each workload is normalized based on the
> elapsed time for 1.2 release.The lower the better.*
> HiBench
> ------------------------------
>
> *JOB*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> sleep
>
> %
>
> %
>
> %
>
> -0.5%
>
> wordcount
>
> 3.5%
>
> 5.4%
>
> 5.1%
>
> 8.7%
>
> kmeans
>
> 6.0%
>
> 72.6%
>
> 82.7%
>
> 100.7%
>
> scan
>
> -0.7%
>
> -3.2%
>
> -1.9%
>
> -4.4%
>
> bayes
>
> -19.7%
>
> 7.7%
>
> -24.5%
>
> -14.4%
>
> aggregation
>
> 4.6%
>
> 7.1%
>
> 9.9%
>
> 9.3%
>
> join
>
> 0.7%
>
> 4.0%
>
> 8.6%
>
> 1.3%
>
> sort
>
> -1.0%
>
> 2.1%
>
> -1.8%
>
> -10.4%
>
> pagerank
>
> 1.5 %
>
> 2.2%
>
> 1.3%
>
> 5.4%
>
> terasort
>
> -3.7%
>
> -3.3%
>
> -3.7%
>
> -9.5%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Release. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> spark-perf
> ------------------------------
>
> *JOB*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> agg
>
> 1.9%
>
> 3.1%
>
> 6.2%
>
> 5.0%
>
> agg-int
>
> 6.4%
>
> 17.1%
>
> 18.0%
>
> 24.2%
>
> agg-naive
>
> -2.6%
>
> -3.2%
>
> -1.8%
>
> -5.2%
>
> scheduling
>
> 8.2%
>
> -16.8%
>
> -14.4%
>
> -19.1%
>
> count-filter
>
> -0.4%
>
> 0.3%
>
> -0.5%
>
> 0.4%
>
> count
>
> 0.6%
>
> -0.3%
>
> 0.4%
>
> 0.9%
>
> sort
>
> 1.2%
>
> -3.3%
>
> -5.3%
>
> -1.9%
>
> sort-int
>
> 10.1%
>
> 10.0%
>
> 12.3%
>
> 6.0%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Release. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
>  ------------------------------
>
> Copyright © 2015 Intel Corporation. All rights reserved. * *Other names
> and brands may be claimed as the property of others.*
> * Project Email: sparkscore@lists.01.org <sp...@lists.01.org> Please
> subscribe to the list at: https://lists.01.org/mailman/listinfo/sparkscore
> <https://lists.01.org/mailman/listinfo/sparkscore>*
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

Re: [SparkScore] Performance portal for Apache Spark

Posted by Sandy Ryza <sa...@cloudera.com>.

This looks really awesome.

On Tue, Jun 16, 2015 at 10:27 AM, Huang, Jie <ji...@intel.com> wrote:

>  Hi All
>
> We are happy to announce Performance portal for Apache Spark
> http://01org.github.io/sparkscore/ !
>
> The Performance Portal for Apache Spark provides performance data on the
> Spark upsteam to the community to help identify issues, better understand
> performance differentials between versions, and help Spark customers get
> across the finish line faster. The Performance Portal generates two
> reports, regular (weekly) report and release based regression test report.
> We are currently using two benchmark suites which include HiBench (
> http://github.com/intel-bigdata/HiBench) and Spark-perf (
> https://github.com/databricks/spark-perf ). We welcome and look forward
> to your suggestions and feedbacks. More information and details provided
> below
> Abount Performance Portal for Apache Spark
>
> Our goal is to work with the Apache Spark community to further enhance the
> scalability and reliability of the Apache Spark. The data available on this
> site allows community members and potential Spark customers to closely
> track performance trend of the Apache Spark. Ultimately, we hope that this
> project will help community to fix performance issue quickly, thus
> providing better Apache spark code to end customers. The current workloads
> used in the benchmarking include HiBench (a benchmark suite to evaluate big
> data framework like Hadoop MR, Spark from Intel) and Spark-perf (a
> performance testing framework for Apache Spark from Databricks). Additional
> benchmarks will be added as they become available
> Description
> ------------------------------
>
> Each data point represents each workload runtime percent compared with the
> previous week. Different lines represents different workloads running on
> spark yarn-client mode.
> Hardware
> ------------------------------
>
> CPU type: Intel® Xeon® CPU E5-2697 v2 @ 2.70GHz
> Memory: 128GB
> NIC: 10GbE
> Disk(s): 8 x 1TB SATA HDD
> Software
> ------------------------------
>
> JAVA ver sion: 1.8.0_25
> Hadoop version: 2.5.0-CDH5.3.2
> HiBench version: 4.0
> Spark on yarn-client mode
> Cluster
> ------------------------------
>
> 1 node for Master
> 10 nodes for Slave
> Summary
>
> The lower percent the better performance.
>  ------------------------------
>
> *Group*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> HiBench
>
> 9.1%
>
> 6.6%
>
> 6.0%
>
> 7.9%
>
> -6.5%
>
> -3.1%
>
> spark-perf
>
> 4.1%
>
> 4.4%
>
> -1.8%
>
> 4.1%
>
> -4.7%
>
> -4.6%
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> HiBench
> ------------------------------
>
> *JOB*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> *commit*
>
> *489700c8 *
>
> *8e3822a0 *
>
> *530efe3e *
>
> *90c60692 *
>
> *db81b9d8 *
>
> *4eb48ed1 *
>
> sleep
>
> %
>
> %
>
> -2.1%
>
> -2.9%
>
> -4.1%
>
> 12.8%
>
> wordcount
>
> 17.6%
>
> 11.4%
>
> 8.0%
>
> 8.3%
>
> -18.6%
>
> -10.9%
>
> kmeans
>
> 92.1%
>
> 61.5%
>
> 72.1%
>
> 92.9%
>
> 86.9%
>
> 95.8%
>
> scan
>
> -4.9%
>
> -7.2%
>
> %
>
> -1.1%
>
> -25.5%
>
> -21.0%
>
> bayes
>
> -24.3%
>
> -20.1%
>
> -18.3%
>
> -11.1%
>
> -29.7%
>
> -31.3%
>
> aggregation
>
> 5.6%
>
> 10.5%
>
> %
>
> 9.2%
>
> -15.3%
>
> -15.0%
>
> join
>
> 4.5%
>
> 1.2%
>
> %
>
> 1.0%
>
> -12.7%
>
> -13.9%
>
> sort
>
> -3.3%
>
> -0.5%
>
> -11.9%
>
> -12.5%
>
> -17.5%
>
> -17.3%
>
> pagerank
>
> 2.2%
>
> 3.2%
>
> 4.0%
>
> 2.9%
>
> -11.4%
>
> -13.0%
>
> terasort
>
> -7.1%
>
> -0.2%
>
> -9.5%
>
> -7.3%
>
> -16.7%
>
> -17.0%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> spark-perf
> ------------------------------
>
> *JOB*
>
> *ww19 *
>
> *ww20 *
>
> *ww22 *
>
> *ww23 *
>
> *ww24 *
>
> *ww25 *
>
> *commit*
>
> *489700c8 *
>
> *8e3822a0 *
>
> *530efe3e *
>
> *90c60692 *
>
> *db81b9d8 *
>
> *4eb48ed1 *
>
> agg
>
> 13.2%
>
> 7.0%
>
> %
>
> 18.3%
>
> 5.2%
>
> 2.5%
>
> agg-int
>
> 16.4%
>
> 21.2%
>
> %
>
> 9.6%
>
> 4.0%
>
> 8.2%
>
> agg-naive
>
> 4.3%
>
> -2.4%
>
> %
>
> -0.8%
>
> -6.7%
>
> -6.8 %
>
> scheduling
>
> -6.1%
>
> -8.9%
>
> -14.5%
>
> -2.1%
>
> -6.4%
>
> -6.5%
>
> count-filter
>
> 4.1%
>
> 1.0%
>
> 6.6%
>
> 6.8%
>
> -10.2%
>
> -10.4%
>
> count
>
> 4.8%
>
> 4.6%
>
> 6.7%
>
> 8.0%
>
> -7.3%
>
> -7.0%
>
> sort
>
> -8.1%
>
> -2.5%
>
> -6.2%
>
> -7.0%
>
> -14.6%
>
> -14.4%
>
> sort-int
>
> 4.5%
>
> 15.3%
>
> -1.6%
>
> -0.1%
>
> -1.5%
>
> -2.2%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Work Week. *
>
> * The commit number can be found in the result table. The pe rformance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
>  Release
> Summary
>
> The lower percent the better performance.
>  ------------------------------
>
> *Group*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> HiBench
>
> -1.0%
>
> 10.5%
>
> 8.4%
>
> 8.6%
>
> spark-perf
>
> 3.2%
>
> 0.9%
>
> 1.9%
>
> 1.3%
>
>
> *Y-Axis: normalized completion time; X-Axis: Release.*
> * The performance score for each workload is normalized based on the
> elapsed time for 1.2 release.The lower the better.*
> HiBench
> ------------------------------
>
> *JOB*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> sleep
>
> %
>
> %
>
> %
>
> -0.5%
>
> wordcount
>
> 3.5%
>
> 5.4%
>
> 5.1%
>
> 8.7%
>
> kmeans
>
> 6.0%
>
> 72.6%
>
> 82.7%
>
> 100.7%
>
> scan
>
> -0.7%
>
> -3.2%
>
> -1.9%
>
> -4.4%
>
> bayes
>
> -19.7%
>
> 7.7%
>
> -24.5%
>
> -14.4%
>
> aggregation
>
> 4.6%
>
> 7.1%
>
> 9.9%
>
> 9.3%
>
> join
>
> 0.7%
>
> 4.0%
>
> 8.6%
>
> 1.3%
>
> sort
>
> -1.0%
>
> 2.1%
>
> -1.8%
>
> -10.4%
>
> pagerank
>
> 1.5 %
>
> 2.2%
>
> 1.3%
>
> 5.4%
>
> terasort
>
> -3.7%
>
> -3.3%
>
> -3.7%
>
> -9.5%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Release. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
> spark-perf
> ------------------------------
>
> *JOB*
>
> *1.2.1 *
>
> *1.3.0 *
>
> *1.3.1 *
>
> *1.4.0 *
>
> agg
>
> 1.9%
>
> 3.1%
>
> 6.2%
>
> 5.0%
>
> agg-int
>
> 6.4%
>
> 17.1%
>
> 18.0%
>
> 24.2%
>
> agg-naive
>
> -2.6%
>
> -3.2%
>
> -1.8%
>
> -5.2%
>
> scheduling
>
> 8.2%
>
> -16.8%
>
> -14.4%
>
> -19.1%
>
> count-filter
>
> -0.4%
>
> 0.3%
>
> -0.5%
>
> 0.4%
>
> count
>
> 0.6%
>
> -0.3%
>
> 0.4%
>
> 0.9%
>
> sort
>
> 1.2%
>
> -3.3%
>
> -5.3%
>
> -1.9%
>
> sort-int
>
> 10.1%
>
> 10.0%
>
> 12.3%
>
> 6.0%
>
> Comments: null means no such workload running or workload failed in this
> time.
>
>
> *Y-Axis: normalized completion time; X-Axis: Release. *
>
> * The commit number can be found in the result table. The performance
> score for each workload is normalized based on the elapsed time for 1.2
> release.The lower the better.*
>  ------------------------------
>
> Copyright © 2015 Intel Corporation. All rights reserved. * *Other names
> and brands may be claimed as the property of others.*
> * Project Email: sparkscore@lists.01.org <sp...@lists.01.org> Please
> subscribe to the list at: https://lists.01.org/mailman/listinfo/sparkscore
> <https://lists.01.org/mailman/listinfo/sparkscore>*
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>