You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/03 12:15:58 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

HyukjinKwon opened a new pull request #32044:
URL: https://github.com/apache/spark/pull/32044


   ### What changes were proposed in this pull request?
   
   https://github.com/apache/spark/pull/32015 added a way to run benchmarks much more easily in the same GitHub Actions build. This PR updates the benchmark results by using the way.
   
   **NOTE** that looks like GitHub Actions use two type of CPU given my observations:
   
   - Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
   - Intel Xeon E5-2673 v4 @ 2.30GHz
   
   Given my quick research, seems like they perform roughly similarly:
   
   ![Screen Shot 2021-04-03 at 9 10 07 PM](https://user-images.githubusercontent.com/6477701/113478080-8079d880-94c1-11eb-831c-e5c8f15cc741.png)
   
   So shouldn't be a big deal especially given that this way is much easier, encourages contributors to run more and guaranteed the same number of cores and same memory.  
   
   ### Why are the changes needed?
   
   To have a base line of the benchmarks accordingly.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only.
   
   ### How was this patch tested?
   
   It was generated from:
   
   - [Run benchmarks: * (JDK 11)](https://github.com/HyukjinKwon/spark/actions/runs/713575465)
   - [Run benchmarks: * (JDK 8)](https://github.com/HyukjinKwon/spark/actions/runs/713154337)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #32044:
URL: https://github.com/apache/spark/pull/32044#discussion_r606672262



##########
File path: sql/core/benchmarks/WideSchemaBenchmark-results.txt
##########
@@ -2,144 +2,144 @@
 parsing large select expressions
 ================================================================================================
 
-OpenJDK 64-Bit Server VM 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09 on Linux 4.15.0-1044-aws
-Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
 parsing large select:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-1 select expressions                                  5             13           8          0.0     5370143.0       1.0X
-100 select expressions                               12             16           6          0.0    11995425.0       0.4X
-2500 select expressions                             211            214           4          0.0   210927791.0       0.0X
+1 select expressions                                  1              2           0          0.0     1296117.0       1.0X
+100 select expressions                                9             11           1          0.0     8808690.0       0.1X
+2500 select expressions                             422            426           5          0.0   421632363.0       0.0X

Review comment:
       regression by 2 times?

##########
File path: sql/core/benchmarks/CSVBenchmark-results.txt
##########
@@ -2,66 +2,66 @@
 Benchmark to measure CSV read/write performance
 ================================================================================================
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
-Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
 Parsing quoted values:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-One quoted string                                 24185          24195          10          0.0      483694.2       1.0X
+One quoted string                                 43757          44446         765          0.0      875148.4       1.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15.7
-Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
 Wide rows with 1000 columns:              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Select 1000 columns                               61793          62388         532          0.0       61793.4       1.0X
-Select 100 columns                                21958          21993          34          0.0       21957.9       2.8X
-Select one column                                 18215          18515         505          0.1       18215.0       3.4X
-count()                                            5865           6168         296          0.2        5865.1      10.5X
-Select 100 columns, one bad input field           39638          39739         124          0.0       39637.5       1.6X
-Select 100 columns, corrupt record field          47290          48133         741          0.0       47290.0       1.3X
+Select 1000 columns                               96330          99161         NaN          0.0       96329.7       1.0X
+Select 100 columns                                41414          42672        1556          0.0       41414.1       2.3X
+Select one column                                 35365          36113         662          0.0       35365.4       2.7X
+count()                                           18845          18867          26          0.1       18845.0       5.1X

Review comment:
       regression by 2 times




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952709194


   > @HyukjinKwon Can we use this way to generate the benchmarks results with Java 17?
   
   Let me study #32015 first
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #32044:
URL: https://github.com/apache/spark/pull/32044


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812870097


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41459/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812861905


   **[Test build #136883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136883/testReport)** for PR 32044 at commit [`33f2ebe`](https://github.com/apache/spark/commit/33f2ebe247f8ed4b563552b6e1fdd83df7cec607).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812918337


   +1, LGTM. The PR updates only benchmark results. The failed GA are not related to this PR. Merging to master.
   Thank you @HyukjinKwon , and @wangyum @dongjoon-hyun for your reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812876081


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41459/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812878424


   **[Test build #136883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136883/testReport)** for PR 32044 at commit [`33f2ebe`](https://github.com/apache/spark/commit/33f2ebe247f8ed4b563552b6e1fdd83df7cec607).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812876081


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41459/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812861905


   **[Test build #136883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136883/testReport)** for PR 32044 at commit [`33f2ebe`](https://github.com/apache/spark/commit/33f2ebe247f8ed4b563552b6e1fdd83df7cec607).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952813716


   Thank you for your explanation
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952776553


   Yes, they all should generate the files for JDK 11. If they don't, it's a bug.
   
   Yes, we should have another set of these benchmark result files for JDK 17 separately


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952708322


   @HyukjinKwon Can we use this way to generate the benchmarks results with Java 17? 
   
   On the other hand, I found some benchmarks do not have corresponding Java 11 result files, such as `UpdateFieldsBenchmark` and `CharVarcharBenchmark`, Is this expected? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812884079


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136883/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812884079


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136883/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-952709194


   > @HyukjinKwon Can we use this way to generate the benchmarks results with Java 17?
   
   Let me study #32015 first. Should all new benchmarks results  need  generate in this way?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32044: [SPARK-34950][TESTS] Update benchmark results to the ones created by GitHub Actions machines

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32044:
URL: https://github.com/apache/spark/pull/32044#issuecomment-812866914


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41459/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org