You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/10/07 08:38:31 UTC
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/22661
[SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use main method
## What changes were proposed in this pull request?
Refactor `JoinBenchmark` to use main method.
1. use `spark-submit`:
```console
bin/spark-submit --class org.apache.spark.sql.execution.benchmark.JoinBenchmark --jars ./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar ./sql/catalyst/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
```
2. Generate benchmark result:
```console
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.JoinBenchmark"
```
## How was this patch tested?
manual tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-25664
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22661.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22661
----
commit 4339b1cbc5de7e54a7cd5be818fcf3dab249a351
Author: Yuming Wang <yu...@...>
Date: 2018-10-07T08:34:54Z
Refactor JoinBenchmark
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97299/testReport)** for PR 22661 at commit [`28f9b9a`](https://github.com/apache/spark/commit/28f9b9a8a26caf8750aa2e8c8e2bc793b3773d98).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97279/testReport)** for PR 22661 at commit [`3be13b1`](https://github.com/apache/spark/commit/3be13b16f1a59ffbd158265f54ad4f8d511d2018).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97279/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224375578
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -48,13 +48,11 @@ object JoinBenchmark extends SqlBasedBenchmark {
}
}
-
def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
-
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
--- End diff --
For this change, we need rerun the benchmark to get a new result.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97301/testReport)** for PR 22661 at commit [`cd8b664`](https://github.com/apache/spark/commit/cd8b664e17ce613061cf046ee2b5c3f223c1afa7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3918/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22661
cc @dongjoon-hyun
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3920/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r223220438
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,164 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
* Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long duplicated", N) {
- val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k"))
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ codegenBenchmark("Join w long duplicated", N) {
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w long duplicated codegen=false 3446 / 3478 6.1 164.3 1.0X
- *Join w long duplicated codegen=true 322 / 351 65.2 15.3 10.7X
- */
}
- ignore("broadcast hash join, two int key") {
+ def broadcastHashJoinTwoIntKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim2 = broadcast(sparkSession.range(M)
+ val dim2 = broadcast(spark.range(M)
.selectExpr("cast(id as int) as k1", "cast(id as int) as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 ints", N) {
- val df = sparkSession.range(N).join(dim2,
+ codegenBenchmark("Join w 2 ints", N) {
+ val df = spark.range(N).join(dim2,
(col("id") % M).cast(IntegerType) === col("k1")
&& (col("id") % M).cast(IntegerType) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 ints codegen=false 4426 / 4501 4.7 211.1 1.0X
- *Join w 2 ints codegen=true 791 / 818 26.5 37.7 5.6X
- */
}
- ignore("broadcast hash join, two long key") {
+ def broadcastHashJoinTwoLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim3 = broadcast(sparkSession.range(M)
+ val dim3 = broadcast(spark.range(M)
.selectExpr("id as k1", "id as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 longs", N) {
- val df = sparkSession.range(N).join(dim3,
+ codegenBenchmark("Join w 2 longs", N) {
+ val df = spark.range(N).join(dim3,
(col("id") % M) === col("k1") && (col("id") % M) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 longs: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 longs codegen=false 5905 / 6123 3.6 281.6 1.0X
- *Join w 2 longs codegen=true 2230 / 2529 9.4 106.3 2.6X
- */
}
- ignore("broadcast hash join, two long key with duplicates") {
+ def broadcastHashJoinTwoLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim4 = broadcast(sparkSession.range(M)
+ val dim4 = broadcast(spark.range(M)
.selectExpr("cast(id/10 as long) as k1", "cast(id/10 as long) as k2"))
- runBenchmark("Join w 2 longs duplicated", N) {
- val df = sparkSession.range(N).join(dim4,
+ codegenBenchmark("Join w 2 longs duplicated", N) {
+ val df = spark.range(N).join(dim4,
(col("id") bitwiseAND M) === col("k1") && (col("id") bitwiseAND M) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 longs duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 longs duplicated codegen=false 6420 / 6587 3.3 306.1 1.0X
- *Join w 2 longs duplicated codegen=true 2080 / 2139 10.1 99.2 3.1X
- */
}
- ignore("broadcast hash join, outer join long key") {
+
+ def broadcastHashJoinOuterJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("outer join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"), "left")
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("outer join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"), "left")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *outer join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *outer join w long codegen=false 3055 / 3189 6.9 145.7 1.0X
- *outer join w long codegen=true 261 / 276 80.5 12.4 11.7X
- */
}
- ignore("broadcast hash join, semi join long key") {
+
+ def broadcastHashJoinSemiJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("semi join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"), "leftsemi")
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("semi join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"), "leftsemi")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *semi join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *semi join w long codegen=false 1912 / 1990 11.0 91.2 1.0X
- *semi join w long codegen=true 237 / 244 88.3 11.3 8.1X
- */
}
- ignore("sort merge join") {
+ def sortMergeJoin(): Unit = {
val N = 2 << 20
- runBenchmark("merge join", N) {
- val df1 = sparkSession.range(N).selectExpr(s"id * 2 as k1")
- val df2 = sparkSession.range(N).selectExpr(s"id * 3 as k2")
+ codegenBenchmark("merge join", N) {
+ val df1 = spark.range(N).selectExpr(s"id * 2 as k1")
+ val df2 = spark.range(N).selectExpr(s"id * 3 as k2")
val df = df1.join(df2, col("k1") === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
df.count()
}
-
- /*
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *merge join: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *merge join codegen=false 1588 / 1880 1.3 757.1 1.0X
- *merge join codegen=true 1477 / 1531 1.4 704.2 1.1X
- */
}
- ignore("sort merge join with duplicates") {
+ def sortMergeJoinWithDuplicates(): Unit = {
val N = 2 << 20
- runBenchmark("sort merge join", N) {
- val df1 = sparkSession.range(N)
+ codegenBenchmark("sort merge join with duplicates", N) {
+ val df1 = spark.range(N)
.selectExpr(s"(id * 15485863) % ${N*10} as k1")
- val df2 = sparkSession.range(N)
+ val df2 = spark.range(N)
.selectExpr(s"(id * 15485867) % ${N*10} as k2")
val df = df1.join(df2, col("k1") === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
df.count()
}
-
- /*
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *sort merge join: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *sort merge join codegen=false 3626 / 3667 0.6 1728.9 1.0X
- *sort merge join codegen=true 3405 / 3438 0.6 1623.8 1.1X
- */
}
- ignore("shuffle hash join") {
- val N = 4 << 20
- sparkSession.conf.set("spark.sql.shuffle.partitions", "2")
- sparkSession.conf.set("spark.sql.autoBroadcastJoinThreshold", "10000000")
- sparkSession.conf.set("spark.sql.join.preferSortMergeJoin", "false")
- runBenchmark("shuffle hash join", N) {
- val df1 = sparkSession.range(N).selectExpr(s"id as k1")
- val df2 = sparkSession.range(N / 3).selectExpr(s"id * 3 as k2")
- val df = df1.join(df2, col("k1") === col("k2"))
- assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[ShuffledHashJoinExec]).isDefined)
- df.count()
+ def shuffleHashJoin(): Unit = {
+ val N: Long = 4 << 20
+ withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2",
--- End diff --
nit. Could you put `SQLConf.SHUFFLE_PARTITIONS.key` at the next line?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97301/testReport)** for PR 22661 at commit [`cd8b664`](https://github.com/apache/spark/commit/cd8b664e17ce613061cf046ee2b5c3f223c1afa7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3906/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97287/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97301/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97090/testReport)** for PR 22661 at commit [`4859a9f`](https://github.com/apache/spark/commit/4859a9f5e78edf81c211c304a57e2603e60b2cc7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3899/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224523944
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,161 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
* Benchmark to measure performance for aggregate primitives.
--- End diff --
`aggregate primitives` -> `joins`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224934704
--- Diff: sql/core/benchmarks/JoinBenchmark-results.txt ---
@@ -0,0 +1,75 @@
+================================================================================================
+Join Benchmark
+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long wholestage off 4464 / 4483 4.7 212.9 1.0X
+Join w long wholestage on 289 / 339 72.6 13.8 15.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long duplicated wholestage off 5662 / 5678 3.7 270.0 1.0X
+Join w long duplicated wholestage on 332 / 345 63.1 15.8 17.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w 2 ints wholestage off 173174 / 173183 0.1 8257.6 1.0X
+Join w 2 ints wholestage on 166350 / 198362 0.1 7932.2 1.0X
--- End diff --
+1.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97090/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224374650
--- Diff: sql/core/benchmarks/JoinBenchmark-results.txt ---
@@ -0,0 +1,80 @@
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long wholestage off 4062 / 4709 5.2 193.7 1.0X
+Join w long wholestage on 152 / 163 138.4 7.2 26.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long duplicated wholestage off 3793 / 3801 5.5 180.9 1.0X
+Join w long duplicated wholestage on 207 / 219 101.1 9.9 18.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w 2 ints wholestage off 138514 / 139178 0.2 6604.9 1.0X
+Join w 2 ints wholestage on 129908 / 140869 0.2 6194.5 1.1X
--- End diff --
Ur, is this correct? Previously, we had the followings.
```
*Join w 2 ints codegen=false 4426 / 4501 4.7 211.1 1.0X
*Join w 2 ints codegen=true 791 / 818 26.5 37.7 5.6X
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97243/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97299/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224526143
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,161 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
* Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
-
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long duplicated", N) {
- val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k"))
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
+ codegenBenchmark("Join w long duplicated", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w long duplicated codegen=false 3446 / 3478 6.1 164.3 1.0X
- *Join w long duplicated codegen=true 322 / 351 65.2 15.3 10.7X
- */
}
- ignore("broadcast hash join, two int key") {
+ def broadcastHashJoinTwoIntKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim2 = broadcast(sparkSession.range(M)
+ val dim2 = broadcast(spark.range(M)
.selectExpr("cast(id as int) as k1", "cast(id as int) as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 ints", N) {
- val df = sparkSession.range(N).join(dim2,
+ codegenBenchmark("Join w 2 ints", N) {
+ val df = spark.range(N).join(dim2,
(col("id") % M).cast(IntegerType) === col("k1")
&& (col("id") % M).cast(IntegerType) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 ints codegen=false 4426 / 4501 4.7 211.1 1.0X
- *Join w 2 ints codegen=true 791 / 818 26.5 37.7 5.6X
- */
}
- ignore("broadcast hash join, two long key") {
+ def broadcastHashJoinTwoLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim3 = broadcast(sparkSession.range(M)
+ val dim3 = broadcast(spark.range(M)
.selectExpr("id as k1", "id as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 longs", N) {
- val df = sparkSession.range(N).join(dim3,
+ codegenBenchmark("Join w 2 longs", N) {
+ val df = spark.range(N).join(dim3,
(col("id") % M) === col("k1") && (col("id") % M) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 longs: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 longs codegen=false 5905 / 6123 3.6 281.6 1.0X
- *Join w 2 longs codegen=true 2230 / 2529 9.4 106.3 2.6X
- */
}
- ignore("broadcast hash join, two long key with duplicates") {
+ def broadcastHashJoinTwoLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim4 = broadcast(sparkSession.range(M)
+ val dim4 = broadcast(spark.range(M)
.selectExpr("cast(id/10 as long) as k1", "cast(id/10 as long) as k2"))
- runBenchmark("Join w 2 longs duplicated", N) {
- val df = sparkSession.range(N).join(dim4,
+ codegenBenchmark("Join w 2 longs duplicated", N) {
+ val df = spark.range(N).join(dim4,
(col("id") bitwiseAND M) === col("k1") && (col("id") bitwiseAND M) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 longs duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 longs duplicated codegen=false 6420 / 6587 3.3 306.1 1.0X
- *Join w 2 longs duplicated codegen=true 2080 / 2139 10.1 99.2 3.1X
- */
}
- ignore("broadcast hash join, outer join long key") {
+ def broadcastHashJoinOuterJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("outer join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"), "left")
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("outer join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"), "left")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *outer join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *outer join w long codegen=false 3055 / 3189 6.9 145.7 1.0X
- *outer join w long codegen=true 261 / 276 80.5 12.4 11.7X
- */
}
- ignore("broadcast hash join, semi join long key") {
+ def broadcastHashJoinSemiJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("semi join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"), "leftsemi")
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("semi join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"), "leftsemi")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *semi join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *semi join w long codegen=false 1912 / 1990 11.0 91.2 1.0X
- *semi join w long codegen=true 237 / 244 88.3 11.3 8.1X
- */
}
- ignore("sort merge join") {
+ def sortMergeJoin(): Unit = {
val N = 2 << 20
- runBenchmark("merge join", N) {
- val df1 = sparkSession.range(N).selectExpr(s"id * 2 as k1")
- val df2 = sparkSession.range(N).selectExpr(s"id * 3 as k2")
+ codegenBenchmark("merge join", N) {
+ val df1 = spark.range(N).selectExpr(s"id * 2 as k1")
+ val df2 = spark.range(N).selectExpr(s"id * 3 as k2")
val df = df1.join(df2, col("k1") === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
df.count()
}
-
- /*
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *merge join: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *merge join codegen=false 1588 / 1880 1.3 757.1 1.0X
- *merge join codegen=true 1477 / 1531 1.4 704.2 1.1X
- */
}
- ignore("sort merge join with duplicates") {
+ def sortMergeJoinWithDuplicates(): Unit = {
val N = 2 << 20
- runBenchmark("sort merge join", N) {
- val df1 = sparkSession.range(N)
+ codegenBenchmark("sort merge join with duplicates", N) {
+ val df1 = spark.range(N)
.selectExpr(s"(id * 15485863) % ${N*10} as k1")
- val df2 = sparkSession.range(N)
+ val df2 = spark.range(N)
.selectExpr(s"(id * 15485867) % ${N*10} as k2")
val df = df1.join(df2, col("k1") === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
df.count()
}
-
- /*
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *sort merge join: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *sort merge join codegen=false 3626 / 3667 0.6 1728.9 1.0X
- *sort merge join codegen=true 3405 / 3438 0.6 1623.8 1.1X
- */
}
- ignore("shuffle hash join") {
- val N = 4 << 20
- sparkSession.conf.set("spark.sql.shuffle.partitions", "2")
- sparkSession.conf.set("spark.sql.autoBroadcastJoinThreshold", "10000000")
- sparkSession.conf.set("spark.sql.join.preferSortMergeJoin", "false")
- runBenchmark("shuffle hash join", N) {
- val df1 = sparkSession.range(N).selectExpr(s"id as k1")
- val df2 = sparkSession.range(N / 3).selectExpr(s"id * 3 as k2")
- val df = df1.join(df2, col("k1") === col("k2"))
- assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[ShuffledHashJoinExec]).isDefined)
- df.count()
+ def shuffleHashJoin(): Unit = {
+ val N: Long = 4 << 20
+ withSQLConf(
+ SQLConf.SHUFFLE_PARTITIONS.key -> "2",
+ SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "10000000",
+ SQLConf.PREFER_SORTMERGEJOIN.key -> "false") {
+ codegenBenchmark("shuffle hash join", N) {
+ val df1 = spark.range(N).selectExpr(s"id as k1")
+ val df2 = spark.range(N / 3).selectExpr(s"id * 3 as k2")
+ val df = df1.join(df2, col("k1") === col("k2"))
+ assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[ShuffledHashJoinExec]).isDefined)
+ df.count()
+ }
}
+ }
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Windows 7 6.1
- *Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
- *shuffle hash join: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *shuffle hash join codegen=false 2005 / 2010 2.1 478.0 1.0X
- *shuffle hash join codegen=true 1773 / 1792 2.4 422.7 1.1X
- */
+ override def runBenchmarkSuite(): Unit = {
--- End diff --
Could you wrap with something like `runBenchmark("Join Benchmark")`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97249/testReport)** for PR 22661 at commit [`00c4950`](https://github.com/apache/spark/commit/00c495091dfdfb9f647c0e66307b4cc8ef2a19a3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22661
@wangyum . Could you review and merge https://github.com/wangyum/spark/pull/18 ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97279/testReport)** for PR 22661 at commit [`3be13b1`](https://github.com/apache/spark/commit/3be13b16f1a59ffbd158265f54ad4f8d511d2018).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224270597
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,165 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
* Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
--- End diff --
So, this is a removal of redundant one, right?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3779/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97287/testReport)** for PR 22661 at commit [`3be13b1`](https://github.com/apache/spark/commit/3be13b16f1a59ffbd158265f54ad4f8d511d2018).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224714936
--- Diff: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala ---
@@ -200,11 +200,12 @@ private[spark] object Benchmark {
def getProcessorName(): String = {
val cpu = if (SystemUtils.IS_OS_MAC_OSX) {
Utils.executeAndGetOutput(Seq("/usr/sbin/sysctl", "-n", "machdep.cpu.brand_string"))
+ .stripLineEnd
--- End diff --
Because the Mac has one more line than Linux:
https://github.com/apache/spark/pull/22661/commits/28f9b9a8a26caf8750aa2e8c8e2bc793b3773d98#diff-45c96c65f7c46bc2d84843a7cb92f22fL7
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97080/testReport)** for PR 22661 at commit [`4339b1c`](https://github.com/apache/spark/commit/4339b1cbc5de7e54a7cd5be818fcf3dab249a351).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224270755
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,165 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
* Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long duplicated", N) {
- val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k"))
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ codegenBenchmark("Join w long duplicated", N) {
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
--- End diff --
According to another bechmark case in this file, `broadcast` seems to be put outside of `codegenBenchmark`. How do you think about this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224520773
--- Diff: sql/core/benchmarks/JoinBenchmark-results.txt ---
@@ -0,0 +1,80 @@
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long wholestage off 4062 / 4709 5.2 193.7 1.0X
+Join w long wholestage on 152 / 163 138.4 7.2 26.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long duplicated wholestage off 3793 / 3801 5.5 180.9 1.0X
+Join w long duplicated wholestage on 207 / 219 101.1 9.9 18.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w 2 ints wholestage off 138514 / 139178 0.2 6604.9 1.0X
+Join w 2 ints wholestage on 129908 / 140869 0.2 6194.5 1.1X
--- End diff --
Oh, interesting. Although it's beyond the scope, could you run on `branch-2.4` and `branch-2.3` please, too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22661
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224767594
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,163 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
- * Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * Benchmark to measure performance for joins.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
-
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long duplicated", N) {
- val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k"))
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
+ codegenBenchmark("Join w long duplicated", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w long duplicated codegen=false 3446 / 3478 6.1 164.3 1.0X
- *Join w long duplicated codegen=true 322 / 351 65.2 15.3 10.7X
- */
}
- ignore("broadcast hash join, two int key") {
+ def broadcastHashJoinTwoIntKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim2 = broadcast(sparkSession.range(M)
+ val dim2 = broadcast(spark.range(M)
.selectExpr("cast(id as int) as k1", "cast(id as int) as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 ints", N) {
- val df = sparkSession.range(N).join(dim2,
+ codegenBenchmark("Join w 2 ints", N) {
+ val df = spark.range(N).join(dim2,
(col("id") % M).cast(IntegerType) === col("k1")
&& (col("id") % M).cast(IntegerType) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 ints codegen=false 4426 / 4501 4.7 211.1 1.0X
- *Join w 2 ints codegen=true 791 / 818 26.5 37.7 5.6X
- */
--- End diff --
This seems caused by the bug fix: https://github.com/apache/spark/pull/15390
So the performance is reasonable.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97299/testReport)** for PR 22661 at commit [`28f9b9a`](https://github.com/apache/spark/commit/28f9b9a8a26caf8750aa2e8c8e2bc793b3773d98).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224396901
--- Diff: sql/core/benchmarks/JoinBenchmark-results.txt ---
@@ -0,0 +1,80 @@
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long wholestage off 4062 / 4709 5.2 193.7 1.0X
+Join w long wholestage on 152 / 163 138.4 7.2 26.8X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long duplicated wholestage off 3793 / 3801 5.5 180.9 1.0X
+Join w long duplicated wholestage on 207 / 219 101.1 9.9 18.3X
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+
+Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w 2 ints wholestage off 138514 / 139178 0.2 6604.9 1.0X
+Join w 2 ints wholestage on 129908 / 140869 0.2 6194.5 1.1X
--- End diff --
I think it's correct, I ran it on master:
```
build/sbt "sql/test-only *benchmark.JoinBenchmark"
......
[info] JoinBenchmark:
[info] - broadcast hash join, long key !!! IGNORED !!!
[info] - broadcast hash join, long key with duplicates !!! IGNORED !!!
Running benchmark: Join w 2 ints
Running case: Join w 2 ints wholestage off
Stopped after 2 iterations, 307335 ms
Running case: Join w 2 ints wholestage on
Stopped after 5 iterations, 687107 ms
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Join w 2 ints wholestage off 153532 / 153668 0.1 7321.0 1.0X
Join w 2 ints wholestage on 132075 / 137422 0.2 6297.8 1.2X
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97243/testReport)** for PR 22661 at commit [`2baaf35`](https://github.com/apache/spark/commit/2baaf35a89d2cd5f70a0c21c05c392af7affb403).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224768758
--- Diff: sql/core/benchmarks/JoinBenchmark-results.txt ---
@@ -0,0 +1,75 @@
+================================================================================================
+Join Benchmark
+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long wholestage off 4464 / 4483 4.7 212.9 1.0X
+Join w long wholestage on 289 / 339 72.6 13.8 15.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w long duplicated wholestage off 5662 / 5678 3.7 270.0 1.0X
+Join w long duplicated wholestage on 332 / 345 63.1 15.8 17.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------
+Join w 2 ints wholestage off 173174 / 173183 0.1 8257.6 1.0X
+Join w 2 ints wholestage on 166350 / 198362 0.1 7932.2 1.0X
--- End diff --
this surprises me that whole stage codegen doesn't help. We should investigate it later.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224300031
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,165 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
* Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
--- End diff --
Yes
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224934912
--- Diff: core/src/test/scala/org/apache/spark/benchmark/Benchmark.scala ---
@@ -200,11 +200,12 @@ private[spark] object Benchmark {
def getProcessorName(): String = {
val cpu = if (SystemUtils.IS_OS_MAC_OSX) {
Utils.executeAndGetOutput(Seq("/usr/sbin/sysctl", "-n", "machdep.cpu.brand_string"))
+ .stripLineEnd
--- End diff --
Ur.. I'm not a fan to piggy-backing. Okay.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3875/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224676911
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,163 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
- * Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * Benchmark to measure performance for joins.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
-
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long duplicated", N) {
- val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k"))
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
+ codegenBenchmark("Join w long duplicated", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w long duplicated codegen=false 3446 / 3478 6.1 164.3 1.0X
- *Join w long duplicated codegen=true 322 / 351 65.2 15.3 10.7X
- */
}
- ignore("broadcast hash join, two int key") {
+ def broadcastHashJoinTwoIntKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim2 = broadcast(sparkSession.range(M)
+ val dim2 = broadcast(spark.range(M)
.selectExpr("cast(id as int) as k1", "cast(id as int) as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 ints", N) {
- val df = sparkSession.range(N).join(dim2,
+ codegenBenchmark("Join w 2 ints", N) {
+ val df = spark.range(N).join(dim2,
(col("id") % M).cast(IntegerType) === col("k1")
&& (col("id") % M).cast(IntegerType) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 ints codegen=false 4426 / 4501 4.7 211.1 1.0X
- *Join w 2 ints codegen=true 791 / 818 26.5 37.7 5.6X
- */
}
- ignore("broadcast hash join, two long key") {
+ def broadcastHashJoinTwoLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim3 = broadcast(sparkSession.range(M)
+ val dim3 = broadcast(spark.range(M)
.selectExpr("id as k1", "id as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 longs", N) {
- val df = sparkSession.range(N).join(dim3,
+ codegenBenchmark("Join w 2 longs", N) {
+ val df = spark.range(N).join(dim3,
(col("id") % M) === col("k1") && (col("id") % M) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 longs: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 longs codegen=false 5905 / 6123 3.6 281.6 1.0X
- *Join w 2 longs codegen=true 2230 / 2529 9.4 106.3 2.6X
- */
}
- ignore("broadcast hash join, two long key with duplicates") {
+ def broadcastHashJoinTwoLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim4 = broadcast(sparkSession.range(M)
+ val dim4 = broadcast(spark.range(M)
.selectExpr("cast(id/10 as long) as k1", "cast(id/10 as long) as k2"))
- runBenchmark("Join w 2 longs duplicated", N) {
- val df = sparkSession.range(N).join(dim4,
+ codegenBenchmark("Join w 2 longs duplicated", N) {
+ val df = spark.range(N).join(dim4,
(col("id") bitwiseAND M) === col("k1") && (col("id") bitwiseAND M) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 longs duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 longs duplicated codegen=false 6420 / 6587 3.3 306.1 1.0X
- *Join w 2 longs duplicated codegen=true 2080 / 2139 10.1 99.2 3.1X
- */
}
- ignore("broadcast hash join, outer join long key") {
+ def broadcastHashJoinOuterJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("outer join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"), "left")
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("outer join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"), "left")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *outer join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *outer join w long codegen=false 3055 / 3189 6.9 145.7 1.0X
- *outer join w long codegen=true 261 / 276 80.5 12.4 11.7X
- */
}
- ignore("broadcast hash join, semi join long key") {
+ def broadcastHashJoinSemiJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("semi join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"), "leftsemi")
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("semi join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"), "leftsemi")
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *semi join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *semi join w long codegen=false 1912 / 1990 11.0 91.2 1.0X
- *semi join w long codegen=true 237 / 244 88.3 11.3 8.1X
- */
}
- ignore("sort merge join") {
+ def sortMergeJoin(): Unit = {
val N = 2 << 20
- runBenchmark("merge join", N) {
- val df1 = sparkSession.range(N).selectExpr(s"id * 2 as k1")
- val df2 = sparkSession.range(N).selectExpr(s"id * 3 as k2")
+ codegenBenchmark("merge join", N) {
--- End diff --
`merge join` -> `sort merge join`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22661
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97090/testReport)** for PR 22661 at commit [`4859a9f`](https://github.com/apache/spark/commit/4859a9f5e78edf81c211c304a57e2603e60b2cc7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97243/testReport)** for PR 22661 at commit [`2baaf35`](https://github.com/apache/spark/commit/2baaf35a89d2cd5f70a0c21c05c392af7affb403).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97080/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224685493
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,163 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
- * Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * Benchmark to measure performance for joins.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
-
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long duplicated", N) {
- val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k"))
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
+ codegenBenchmark("Join w long duplicated", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w long duplicated codegen=false 3446 / 3478 6.1 164.3 1.0X
- *Join w long duplicated codegen=true 322 / 351 65.2 15.3 10.7X
- */
}
- ignore("broadcast hash join, two int key") {
+ def broadcastHashJoinTwoIntKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim2 = broadcast(sparkSession.range(M)
+ val dim2 = broadcast(spark.range(M)
.selectExpr("cast(id as int) as k1", "cast(id as int) as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 ints", N) {
- val df = sparkSession.range(N).join(dim2,
+ codegenBenchmark("Join w 2 ints", N) {
+ val df = spark.range(N).join(dim2,
(col("id") % M).cast(IntegerType) === col("k1")
&& (col("id") % M).cast(IntegerType) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 ints codegen=false 4426 / 4501 4.7 211.1 1.0X
- *Join w 2 ints codegen=true 791 / 818 26.5 37.7 5.6X
- */
--- End diff --
Any advice is welcome and thank you in advance, @cloud-fan , @gatorsmile , @davies , @rxin .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224678241
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,163 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
- * Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * Benchmark to measure performance for joins.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
-
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long duplicated", N) {
- val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k"))
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
+ codegenBenchmark("Join w long duplicated", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w long duplicated codegen=false 3446 / 3478 6.1 164.3 1.0X
- *Join w long duplicated codegen=true 322 / 351 65.2 15.3 10.7X
- */
}
- ignore("broadcast hash join, two int key") {
+ def broadcastHashJoinTwoIntKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim2 = broadcast(sparkSession.range(M)
+ val dim2 = broadcast(spark.range(M)
.selectExpr("cast(id as int) as k1", "cast(id as int) as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 ints", N) {
- val df = sparkSession.range(N).join(dim2,
+ codegenBenchmark("Join w 2 ints", N) {
+ val df = spark.range(N).join(dim2,
(col("id") % M).cast(IntegerType) === col("k1")
&& (col("id") % M).cast(IntegerType) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 ints codegen=false 4426 / 4501 4.7 211.1 1.0X
- *Join w 2 ints codegen=true 791 / 818 26.5 37.7 5.6X
- */
--- End diff --
For now, I also cannot get a consistent result like above. I mean I got the same weird result like you. Let me take a look this more.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97287/testReport)** for PR 22661 at commit [`3be13b1`](https://github.com/apache/spark/commit/3be13b16f1a59ffbd158265f54ad4f8d511d2018).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97249/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3878/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark t...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22661#discussion_r224934660
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala ---
@@ -19,229 +19,163 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.spark.sql.execution.joins._
import org.apache.spark.sql.functions._
+import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.IntegerType
/**
- * Benchmark to measure performance for aggregate primitives.
- * To run this:
- * build/sbt "sql/test-only *benchmark.JoinBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * Benchmark to measure performance for joins.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/JoinBenchmark-results.txt".
+ * }}}
*/
-class JoinBenchmark extends BenchmarkWithCodegen {
+object JoinBenchmark extends SqlBasedBenchmark {
- ignore("broadcast hash join, long key") {
+ def broadcastHashJoinLongKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long", N) {
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("id as k", "cast(id as string) as v"))
+ codegenBenchmark("Join w long", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- Join w long: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- -------------------------------------------------------------------------------------------
- Join w long codegen=false 3002 / 3262 7.0 143.2 1.0X
- Join w long codegen=true 321 / 371 65.3 15.3 9.3X
- */
}
- ignore("broadcast hash join, long key with duplicates") {
+ def broadcastHashJoinLongKeyWithDuplicates(): Unit = {
val N = 20 << 20
val M = 1 << 16
-
- val dim = broadcast(sparkSession.range(M).selectExpr("id as k", "cast(id as string) as v"))
- runBenchmark("Join w long duplicated", N) {
- val dim = broadcast(sparkSession.range(M).selectExpr("cast(id/10 as long) as k"))
- val df = sparkSession.range(N).join(dim, (col("id") % M) === col("k"))
+ val dim = broadcast(spark.range(M).selectExpr("cast(id/10 as long) as k"))
+ codegenBenchmark("Join w long duplicated", N) {
+ val df = spark.range(N).join(dim, (col("id") % M) === col("k"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w long duplicated: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w long duplicated codegen=false 3446 / 3478 6.1 164.3 1.0X
- *Join w long duplicated codegen=true 322 / 351 65.2 15.3 10.7X
- */
}
- ignore("broadcast hash join, two int key") {
+ def broadcastHashJoinTwoIntKey(): Unit = {
val N = 20 << 20
val M = 1 << 16
- val dim2 = broadcast(sparkSession.range(M)
+ val dim2 = broadcast(spark.range(M)
.selectExpr("cast(id as int) as k1", "cast(id as int) as k2", "cast(id as string) as v"))
- runBenchmark("Join w 2 ints", N) {
- val df = sparkSession.range(N).join(dim2,
+ codegenBenchmark("Join w 2 ints", N) {
+ val df = spark.range(N).join(dim2,
(col("id") % M).cast(IntegerType) === col("k1")
&& (col("id") % M).cast(IntegerType) === col("k2"))
assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[BroadcastHashJoinExec]).isDefined)
df.count()
}
-
- /*
- *Java HotSpot(TM) 64-Bit Server VM 1.7.0_60-b19 on Mac OS X 10.9.5
- *Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
- *Join w 2 ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
- *-------------------------------------------------------------------------------------------
- *Join w 2 ints codegen=false 4426 / 4501 4.7 211.1 1.0X
- *Join w 2 ints codegen=true 791 / 818 26.5 37.7 5.6X
- */
--- End diff --
Thank you for confirmation, @cloud-fan !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3771/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97249/testReport)** for PR 22661 at commit [`00c4950`](https://github.com/apache/spark/commit/00c495091dfdfb9f647c0e66307b4cc8ef2a19a3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22661
**[Test build #97080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97080/testReport)** for PR 22661 at commit [`4339b1c`](https://github.com/apache/spark/commit/4339b1cbc5de7e54a7cd5be818fcf3dab249a351).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22661: [SPARK-25664][SQL][TEST] Refactor JoinBenchmark to use m...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22661
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org