You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dongjoon-hyun (via GitHub)" <gi...@apache.org> on 2023/02/17 22:53:35 UTC
[GitHub] [spark] dongjoon-hyun opened a new pull request, #40072: [SPARK-42483][TESTS] Regenerate benchmark results
dongjoon-hyun opened a new pull request, #40072:
URL: https://github.com/apache/spark/pull/40072
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
4. Be sure to keep the PR description updated to reflect all changes.
5. Please write your PR title to summarize what this PR proposes.
6. If possible, provide a concise example to reproduce the issue for a faster review.
7. If you want to add a new configuration, please read the guideline first for naming configurations in
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
8. If you want to add or modify an error type or message, please read the guideline first in
'core/src/main/resources/error/README.md'.
-->
### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
2. If you fix some SQL features, you can provide some references of other DBMSes.
3. If there is design documentation, please add the link.
4. If there is a discussion in the mailing list, please add the link.
-->
### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
-->
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
If no, write 'No'.
-->
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110460334
##########
core/benchmarks/ZStandardBenchmark-jdk11-results.txt:
##########
@@ -2,26 +2,26 @@
Benchmark ZStandardCompressionCodec
================================================================================================
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool 859 872 21 0.0 85890.3 1.0X
-Compression 10000 times at level 2 without buffer pool 930 932 2 0.0 92995.6 0.9X
-Compression 10000 times at level 3 without buffer pool 1137 1138 2 0.0 113664.6 0.8X
-Compression 10000 times at level 1 with buffer pool 662 664 1 0.0 66244.7 1.3X
-Compression 10000 times at level 2 with buffer pool 725 726 1 0.0 72541.4 1.2X
-Compression 10000 times at level 3 with buffer pool 929 930 2 0.0 92851.4 0.9X
+Compression 10000 times at level 1 without buffer pool 605 812 220 0.0 60521.0 1.0X
+Compression 10000 times at level 2 without buffer pool 665 678 20 0.0 66512.5 0.9X
+Compression 10000 times at level 3 without buffer pool 890 903 20 0.0 88961.3 0.7X
+Compression 10000 times at level 1 with buffer pool 829 839 11 0.0 82940.2 0.7X
Review Comment:
Java 8/17 doesn't have this regression.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110470216
##########
sql/core/benchmarks/UpdateFieldsBenchmark-results.txt:
##########
@@ -2,25 +2,25 @@
Add 2 columns and drop 2 columns at 3 different depths of nesting
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Add 2 columns and drop 2 columns at 3 different depths of nesting: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------------------
-To non-nullable StructTypes using performant method 4 6 3 0.0 Infinity 1.0X
-To nullable StructTypes using performant method 3 4 1 0.0 Infinity 1.3X
-To non-nullable StructTypes using non-performant method 54 63 5 0.0 Infinity 0.1X
-To nullable StructTypes using non-performant method 2002 2091 127 0.0 Infinity 0.0X
+To non-nullable StructTypes using performant method 6 8 3 0.0 Infinity 1.0X
+To nullable StructTypes using performant method 4 5 2 0.0 Infinity 1.4X
+To non-nullable StructTypes using non-performant method 68 73 5 0.0 Infinity 0.1X
+To nullable StructTypes using non-performant method 2223 2452 324 0.0 Infinity 0.0X
================================================================================================
Add 50 columns and drop 50 columns at 100 different depths of nesting
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Add 50 columns and drop 50 columns at 100 different depths of nesting: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------------------------------
-To non-nullable StructTypes using performant method 5520 5639 168 0.0 Infinity 1.0X
-To nullable StructTypes using performant method 2657 2708 72 0.0 Infinity 2.1X
+To non-nullable StructTypes using performant method 3126 3150 34 0.0 Infinity 1.0X
+To nullable StructTypes using performant method 3136 4768 2309 0.0 Infinity 1.0X
Review Comment:
This looks like a regression. We need to take a look at this later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110463604
##########
sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt:
##########
@@ -1,105 +1,105 @@
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test contains use empty Set: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Use HashSet 4 4 0 226.9 4.4 1.0X
-Use EnumSet 1 1 0 737.3 1.4 3.2X
+Use HashSet 0 1 0 2440.2 0.4 1.0X
+Use EnumSet 1 1 0 884.8 1.1 0.4X
Review Comment:
`HashSet` seems to get some improvements.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110463604
##########
sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt:
##########
@@ -1,105 +1,105 @@
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test contains use empty Set: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Use HashSet 4 4 0 226.9 4.4 1.0X
-Use EnumSet 1 1 0 737.3 1.4 3.2X
+Use HashSet 0 1 0 2440.2 0.4 1.0X
+Use EnumSet 1 1 0 884.8 1.1 0.4X
Review Comment:
`HashSet` seems to get some improvements in this case, `contains use empty Set:`. The other cases looks in a reasonable range.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110497257
##########
sql/core/benchmarks/DataSourceReadBenchmark-results.txt:
##########
@@ -2,430 +2,430 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-SQL CSV 10433 10554 172 1.5 663.3 1.0X
-SQL Json 7948 7990 60 2.0 505.3 1.3X
-SQL Parquet Vectorized: DataPageV1 126 149 22 125.2 8.0 83.0X
-SQL Parquet Vectorized: DataPageV2 99 113 17 158.6 6.3 105.2X
-SQL Parquet MR: DataPageV1 1777 1784 9 8.8 113.0 5.9X
-SQL Parquet MR: DataPageV2 1579 1583 6 10.0 100.4 6.6X
-SQL ORC Vectorized 158 165 5 99.7 10.0 66.1X
-SQL ORC MR 1654 1661 9 9.5 105.2 6.3X
-
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+SQL CSV 13143 13363 311 1.2 835.6 1.0X
Review Comment:
CSV seems to become 30% slower.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110470216
##########
sql/core/benchmarks/UpdateFieldsBenchmark-results.txt:
##########
@@ -2,25 +2,25 @@
Add 2 columns and drop 2 columns at 3 different depths of nesting
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Add 2 columns and drop 2 columns at 3 different depths of nesting: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------------------
-To non-nullable StructTypes using performant method 4 6 3 0.0 Infinity 1.0X
-To nullable StructTypes using performant method 3 4 1 0.0 Infinity 1.3X
-To non-nullable StructTypes using non-performant method 54 63 5 0.0 Infinity 0.1X
-To nullable StructTypes using non-performant method 2002 2091 127 0.0 Infinity 0.0X
+To non-nullable StructTypes using performant method 6 8 3 0.0 Infinity 1.0X
+To nullable StructTypes using performant method 4 5 2 0.0 Infinity 1.4X
+To non-nullable StructTypes using non-performant method 68 73 5 0.0 Infinity 0.1X
+To nullable StructTypes using non-performant method 2223 2452 324 0.0 Infinity 0.0X
================================================================================================
Add 50 columns and drop 50 columns at 100 different depths of nesting
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Add 50 columns and drop 50 columns at 100 different depths of nesting: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------------------------------
-To non-nullable StructTypes using performant method 5520 5639 168 0.0 Infinity 1.0X
-To nullable StructTypes using performant method 2657 2708 72 0.0 Infinity 2.1X
+To non-nullable StructTypes using performant method 3126 3150 34 0.0 Infinity 1.0X
+To nullable StructTypes using performant method 3136 4768 2309 0.0 Infinity 1.0X
Review Comment:
This looks like a regression in Java 8. We need to take a look at this later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110703022
##########
sql/core/benchmarks/DataSourceReadBenchmark-results.txt:
##########
@@ -2,430 +2,430 @@
SQL Single Numeric Column Scan
================================================================================================
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-SQL CSV 10433 10554 172 1.5 663.3 1.0X
-SQL Json 7948 7990 60 2.0 505.3 1.3X
-SQL Parquet Vectorized: DataPageV1 126 149 22 125.2 8.0 83.0X
-SQL Parquet Vectorized: DataPageV2 99 113 17 158.6 6.3 105.2X
-SQL Parquet MR: DataPageV1 1777 1784 9 8.8 113.0 5.9X
-SQL Parquet MR: DataPageV2 1579 1583 6 10.0 100.4 6.6X
-SQL ORC Vectorized 158 165 5 99.7 10.0 66.1X
-SQL ORC MR 1654 1661 9 9.5 105.2 6.3X
-
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+SQL CSV 13143 13363 311 1.2 835.6 1.0X
Review Comment:
Hmm, it's significant.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110463604
##########
sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt:
##########
@@ -1,105 +1,105 @@
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test contains use empty Set: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Use HashSet 4 4 0 226.9 4.4 1.0X
-Use EnumSet 1 1 0 737.3 1.4 3.2X
+Use HashSet 0 1 0 2440.2 0.4 1.0X
+Use EnumSet 1 1 0 884.8 1.1 0.4X
Review Comment:
`HashSet` seems to get some improvements in this case, `contains use empty Set:`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110471680
##########
sql/core/benchmarks/TPCDSQueryBenchmark-jdk11-results.txt:
##########
@@ -1,810 +1,810 @@
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-q1 1772 1905 188 0.3 3841.1 1.0X
+q1 1888 2074 263 0.2 4092.0 1.0X
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-q2 1686 1696 15 1.3 755.2 1.0X
+q2 1585 1899 444 1.4 710.1 1.0X
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
TPCDS Snappy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-q3 718 759 41 4.1 241.8 1.0X
+q3 996 1035 55 3.0 335.3 1.0X
Review Comment:
Maybe, slower?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40072:
URL: https://github.com/apache/spark/pull/40072#issuecomment-1435501963
Thank you so much always for your help, @viirya !
Merged to master for Apache Spark 3.5.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110464359
##########
sql/catalyst/benchmarks/HashBenchmark-jdk11-results.txt:
##########
@@ -2,69 +2,69 @@
single ints
================================================================================================
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Hash For single ints: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-interpreted version 3763 3769 8 142.7 7.0 1.0X
-codegen version 4658 4662 5 115.3 8.7 0.8X
-codegen version 64-bit 4706 4710 6 114.1 8.8 0.8X
-codegen HiveHash version 3998 3998 0 134.3 7.4 0.9X
+interpreted version 4933 4935 2 108.8 9.2 1.0X
+codegen version 5135 5141 9 104.6 9.6 1.0X
+codegen version 64-bit 5071 5079 10 105.9 9.4 1.0X
+codegen HiveHash version 4326 4326 0 124.1 8.1 1.1X
Review Comment:
Now, this is the fastest.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110461118
##########
sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt:
##########
@@ -1,105 +1,105 @@
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test contains use empty Set: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Use HashSet 4 4 0 226.9 4.4 1.0X
-Use EnumSet 1 1 0 737.3 1.4 3.2X
+Use HashSet 0 1 0 2440.2 0.4 1.0X
+Use EnumSet 1 1 0 884.8 1.1 0.4X
Review Comment:
We need to investigate this reversed ratio.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #40072:
URL: https://github.com/apache/spark/pull/40072#issuecomment-1435486887
When you have some time, could you review this, @viirya ? I want to merge this to proceed the further investigations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110473324
##########
sql/core/benchmarks/SortBenchmark-jdk17-results.txt:
##########
@@ -2,15 +2,15 @@
radix sort
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.5+8 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 17.0.6+10 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
radix sort 25000000: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-reference TimSort key prefix array 12059 12071 16 2.1 482.4 1.0X
-reference Arrays.sort 2864 2887 33 8.7 114.5 4.2X
-radix sort one byte 197 203 8 126.8 7.9 61.1X
-radix sort two bytes 373 375 2 66.9 14.9 32.3X
-radix sort eight bytes 1415 1417 4 17.7 56.6 8.5X
-radix sort key prefix array 1930 1966 51 13.0 77.2 6.2X
+reference TimSort key prefix array 12111 12128 23 2.1 484.4 1.0X
+reference Arrays.sort 2861 2885 35 8.7 114.4 4.2X
+radix sort one byte 197 197 0 127.0 7.9 61.5X
+radix sort two bytes 371 372 0 67.4 14.8 32.6X
+radix sort eight bytes 1391 1397 8 18.0 55.7 8.7X
+radix sort key prefix array 1914 1951 52 13.1 76.6 6.3X
Review Comment:
In this benchmark, all Java 17 results are faster than Java 8.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110460125
##########
core/benchmarks/ZStandardBenchmark-jdk11-results.txt:
##########
@@ -2,26 +2,26 @@
Benchmark ZStandardCompressionCodec
================================================================================================
-OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+OpenJDK 64-Bit Server VM 11.0.18+10 on Linux 5.15.0-1031-azure
+Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Benchmark ZStandardCompressionCodec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------------------
-Compression 10000 times at level 1 without buffer pool 859 872 21 0.0 85890.3 1.0X
-Compression 10000 times at level 2 without buffer pool 930 932 2 0.0 92995.6 0.9X
-Compression 10000 times at level 3 without buffer pool 1137 1138 2 0.0 113664.6 0.8X
-Compression 10000 times at level 1 with buffer pool 662 664 1 0.0 66244.7 1.3X
-Compression 10000 times at level 2 with buffer pool 725 726 1 0.0 72541.4 1.2X
-Compression 10000 times at level 3 with buffer pool 929 930 2 0.0 92851.4 0.9X
+Compression 10000 times at level 1 without buffer pool 605 812 220 0.0 60521.0 1.0X
+Compression 10000 times at level 2 without buffer pool 665 678 20 0.0 66512.5 0.9X
+Compression 10000 times at level 3 without buffer pool 890 903 20 0.0 88961.3 0.7X
+Compression 10000 times at level 1 with buffer pool 829 839 11 0.0 82940.2 0.7X
Review Comment:
I'll take a look at this after this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #40072:
URL: https://github.com/apache/spark/pull/40072#discussion_r1110463981
##########
sql/catalyst/benchmarks/EnumTypeSetBenchmark-results.txt:
##########
@@ -1,105 +1,105 @@
-OpenJDK 64-Bit Server VM 1.8.0_352-b08 on Linux 5.15.0-1023-azure
+OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Test contains use empty Set: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-Use HashSet 5 5 0 209.4 4.8 1.0X
-Use EnumSet 2 2 0 459.8 2.2 2.2X
+Use HashSet 1 1 1 1972.0 0.5 1.0X
+Use EnumSet 2 2 0 444.0 2.3 0.2X
Review Comment:
ditto.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #40072: [SPARK-42483][TESTS] Regenerate benchmark results
URL: https://github.com/apache/spark/pull/40072
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org