You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/03 12:15:58 UTC
[GitHub] [spark] HyukjinKwon opened a new pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
HyukjinKwon opened a new pull request #32044:
URL: https://github.com/apache/spark/pull/32044
### What changes were proposed in this pull request?
https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way.
**NOTE** that looks like GitHub Actions use two type of CPU given my observations:
- Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
- Intel Xeon E5-2673 v4 @ 2.30GHz
Given my quick research, seems like they perform roughly similarly:
![Screen Shot 2021-04-03 at 9 10 07 PM](https://user-images.githubusercontent.com/6477701/113478080-8079d880-94c1-11eb-831c-e5c8f15cc741.png)
So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guaranteed the same number of cores and same memory.
### Why are the changes needed?
To have a base line of the benchmarks accordingly.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
It was generated from:
- [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465)
- [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32044:
URL: https://github.com/apache/spark/pull/32044#discussion_r606672262
##########
File path: sql/core/benchmarks/WideSchemaBenchmark-results.txt
##########
@@ -2,144 +2,144 @@
parsing large select expressions
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
parsing large select: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-1 select expressions 5 13 8 0.0 5370143.0 1.0X
-100 select expressions 12 16 6 0.0 11995425.0 0.4X
-2500 select expressions 211 214 4 0.0 210927791.0 0.0X
+1 select expressions 1 2 0 0.0 1296117.0 1.0X
+100 select expressions 9 11 1 0.0 8808690.0 0.1X
+2500 select expressions 422 426 5 0.0 421632363.0 0.0X
Review comment:
regression by 2 times?
##########
File path: sql/core/benchmarks/CSVBenchmark-results.txt
##########
@@ -2,66 +2,66 @@
Benchmark to measure CSV read/write performance
================================================================================================
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
-Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-One quoted string 24185 24195 10 0.0 483694.2 1.0X
+One quoted string 43757 44446 765 0.0 875148.4 1.0X
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
-Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Select 1000 columns 61793 62388 532 0.0 61793.4 1.0X
-Select 100 columns 21958 21993 34 0.0 21957.9 2.8X
-Select one column 18215 18515 505 0.1 18215.0 3.4X
-count() 5865 6168 296 0.2 5865.1 10.5X
-Select 100 columns, one bad input field 39638 39739 124 0.0 39637.5 1.6X
-Select 100 columns, corrupt record field 47290 48133 741 0.0 47290.0 1.3X
+Select 1000 columns 96330 99161 NaN 0.0 96329.7 1.0X
+Select 100 columns 41414 42672 1556 0.0 41414.1 2.3X
+Select one column 35365 36113 662 0.0 35365.4 2.7X
+count() 18845 18867 26 0.1 18845.0 5.1X
Review comment:
regression by 2 times
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952709194
> @HyukjinKwon Can we use this way to generate the benchmarks results with Java 17?
Let me study #32015 first
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #32044:
URL: https://github.com/apache/spark/pull/32044
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812870097
Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41459/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812861905
**[Test build #136883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136883/testReport)** for PR 32044 at commit [`33f2ebe`](https://github.com/apache/spark/commit/33f2ebe247f8ed4b563552b6e1fdd83df7cec607).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812918337
+1, LGTM. The PR updates only benchmark results. The failed GA are not related to this PR. Merging to master.
Thank you @HyukjinKwon , and @wangyum @dongjoon-hyun for your reviews.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812876081
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41459/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812878424
**[Test build #136883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136883/testReport)** for PR 32044 at commit [`33f2ebe`](https://github.com/apache/spark/commit/33f2ebe247f8ed4b563552b6e1fdd83df7cec607).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812876081
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41459/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812861905
**[Test build #136883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136883/testReport)** for PR 32044 at commit [`33f2ebe`](https://github.com/apache/spark/commit/33f2ebe247f8ed4b563552b6e1fdd83df7cec607).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952813716
Thank you for your explanation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952776553
Yes, they all should generate the files for JDK 11. If they don't, it's a bug.
Yes, we should have another set of these benchmark result files for JDK 17 separately
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952708322
@HyukjinKwon Can we use this way to generate the benchmarks results with Java 17?
On the other hand, I found some benchmarks do not have corresponding Java 11 result files, such as `UpdateFieldsBenchmark` and `CharVarcharBenchmark`, Is this expected?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812884079
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136883/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812884079
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136883/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang edited a comment on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952709194
> @HyukjinKwon Can we use this way to generate the benchmarks results with Java 17?
Let me study #32015 first. Should all new benchmarks results need generate in this way?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812866914
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41459/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org