You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/09/20 07:34:10 UTC

[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22484

    [SPARK-25476][TEST] Refactor AggregateBenchmark to use main method

    ## What changes were proposed in this pull request?
    
    Refactor `AggregateBenchmark` to use main method.
    To gererate benchmark result:
    ```
    SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.AggregateBenchmark"
    ```
    
    
    ## How was this patch tested?
    
    manual tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25476

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22484
    
----
commit 649f2965188efcfa0b1d2b5acb4c0f057ecd3788
Author: Yuming Wang <yu...@...>
Date:   2018-09-20T07:23:46Z

    Refactor AggregateBenchmark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220028846
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) {
    --- End diff --
    
    Do you mean change
    ```scala
    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
      SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
      "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
      f()
    }
    ```
    to 
    ```scala
    withSQLConf(
      SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
      SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
      "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
      f()
    }
    ```
    ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96624/testReport)** for PR 22484 at commit [`0fae54d`](https://github.com/apache/spark/commit/0fae54d9e1a40c021d0ebb5af3d9d969431807f7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96530/testReport)** for PR 22484 at commit [`f783099`](https://github.com/apache/spark/commit/f783099f818d3ed509ab95133ff46e2f183444a8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96500/testReport)** for PR 22484 at commit [`2d778a4`](https://github.com/apache/spark/commit/2d778a4c8fb5d3b373856837496882c05ff1d42d).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait SqlBasedBenchmark extends BenchmarkBase `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220800365
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends SqlBasedBenchmark {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmarkWithCodegen("agg w/o group", N) {
    +        spark.range(N).selectExpr("sum(id)").collect()
    +      }
         }
    -    /*
    -    agg w/o group:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    agg w/o group wholestage off                30136 / 31885         69.6          14.4       1.0X
    -    agg w/o group wholestage on                   1851 / 1860       1132.9           0.9      16.3X
    -     */
    -  }
     
    -  ignore("stat functions") {
    -    val N = 100L << 20
    +    runBenchmark("stat functions") {
    +      val N = 100L << 20
     
    -    runBenchmark("stddev", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
    -    }
    +      runBenchmarkWithCodegen("stddev", N) {
    +        spark.range(N).groupBy().agg("id" -> "stddev").collect()
    +      }
     
    -    runBenchmark("kurtosis", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      runBenchmarkWithCodegen("kurtosis", N) {
    +        spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      }
         }
     
    -    /*
    -    Using ImperativeAggregate (as implemented in Spark 1.6):
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------------
    -      stddev w/o codegen                      2019.04            10.39         1.00 X
    -      stddev w codegen                        2097.29            10.00         0.96 X
    -      kurtosis w/o codegen                    2108.99             9.94         0.96 X
    -      kurtosis w codegen                      2090.69            10.03         0.97 X
    -
    -      Using DeclarativeAggregate:
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      stddev codegen=false                     5630 / 5776         18.0          55.6       1.0X
    -      stddev codegen=true                      1259 / 1314         83.0          12.0       4.5X
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      kurtosis:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      kurtosis codegen=false                 14847 / 15084          7.0         142.9       1.0X
    -      kurtosis codegen=true                    1652 / 2124         63.0          15.9       9.0X
    -    */
    -  }
    -
    -  ignore("aggregate with linear keys") {
    -    val N = 20 << 22
    +    runBenchmark("aggregate with linear keys") {
    +      val N = 20 << 22
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    def f(): Unit = {
    -      sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    -    }
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      def f(): Unit = {
    +        spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    --- End diff --
    
    `s"codegen = F"` -> `"codegen = F"`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221503808
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /** Runs function `f` with whole stage codegen on and off. */
    --- End diff --
    
    Can we use `codegenBenchmark` instead? `runBenchmarkWithCodegen` looks like an extension of `runBenchmark`. It's more like `bitEncodingBenchmark` or `sortBenchmark`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96624/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221415642
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    Actually I don't think the the name `SqlBasedBenchmark` is not appropriate..From the naming we can't tell it is about benchmarking with/without whole codegen. I will try to come up with a better name. Or we can discuss in this thread.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96811/testReport)** for PR 22484 at commit [`6c46ad5`](https://github.com/apache/spark/commit/6c46ad59c063fa6283fb23046300404767a82248).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221417293
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    Then each function can be in different trait...I don't think that `runBenchmarkWithCodegen` has much in common with `runBenchmarkWithParquetPushDown`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221501761
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    For the naming, let's keep the current one for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96666/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3517/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3399/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221416957
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    Maybe we can add more common functions in the future. e.g. `runBenchmarkWithCodegen`, `runBenchmarkWithParquetPushDown`, `runBenchmarkWithOrcPushDown`...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220800434
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends SqlBasedBenchmark {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmarkWithCodegen("agg w/o group", N) {
    +        spark.range(N).selectExpr("sum(id)").collect()
    +      }
         }
    -    /*
    -    agg w/o group:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    agg w/o group wholestage off                30136 / 31885         69.6          14.4       1.0X
    -    agg w/o group wholestage on                   1851 / 1860       1132.9           0.9      16.3X
    -     */
    -  }
     
    -  ignore("stat functions") {
    -    val N = 100L << 20
    +    runBenchmark("stat functions") {
    +      val N = 100L << 20
     
    -    runBenchmark("stddev", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
    -    }
    +      runBenchmarkWithCodegen("stddev", N) {
    +        spark.range(N).groupBy().agg("id" -> "stddev").collect()
    +      }
     
    -    runBenchmark("kurtosis", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      runBenchmarkWithCodegen("kurtosis", N) {
    +        spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      }
         }
     
    -    /*
    -    Using ImperativeAggregate (as implemented in Spark 1.6):
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------------
    -      stddev w/o codegen                      2019.04            10.39         1.00 X
    -      stddev w codegen                        2097.29            10.00         0.96 X
    -      kurtosis w/o codegen                    2108.99             9.94         0.96 X
    -      kurtosis w codegen                      2090.69            10.03         0.97 X
    -
    -      Using DeclarativeAggregate:
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      stddev codegen=false                     5630 / 5776         18.0          55.6       1.0X
    -      stddev codegen=true                      1259 / 1314         83.0          12.0       4.5X
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      kurtosis:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      kurtosis codegen=false                 14847 / 15084          7.0         142.9       1.0X
    -      kurtosis codegen=true                    1652 / 2124         63.0          15.9       9.0X
    -    */
    -  }
    -
    -  ignore("aggregate with linear keys") {
    -    val N = 20 << 22
    +    runBenchmark("aggregate with linear keys") {
    +      val N = 20 << 22
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    def f(): Unit = {
    -      sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    -    }
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      def f(): Unit = {
    +        spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    +          f()
    +        }
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    --- End diff --
    
    `s"codegen = T hashmap = F"` -> `"codegen = T hashmap = F"`
    
    Could you fix all instances like this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3416/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    @wangyum Could you review and merge https://github.com/wangyum/spark/pull/12 , too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220025625
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> true.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> true.toString) {
    --- End diff --
    
    ditto.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221416202
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    How about `CodegenBenchmarkBase` ? This is the best I can think of.. @wangyum @dongjoon-hyun @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96473/testReport)** for PR 22484 at commit [`42230b6`](https://github.com/apache/spark/commit/42230b6e3edb731eb69b3b8800805805e2234d10).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait RunBenchmarkWithCodegen extends BenchmarkBase `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Thank you, @wangyum , @cloud-fan and @gengliangwang !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221501996
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /** Runs function `f` with whole stage codegen on and off. */
    +  def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => Unit): Unit = {
    --- End diff --
    
    This should be `final def runBenchmarkWithCodegen` instead of `def runBenchmarkWithCodegen`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r219725606
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait for micro benchmarks that are supposed to run standalone (i.e. not together
    + * with other test suites).
    + */
    +trait RunBenchmarkWithCodegen extends BenchmarkBase {
    --- End diff --
    
    How about `RunBenchmarkWithCodegen` -> `SqlBaseBenchmarkBase`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3534/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96473/testReport)** for PR 22484 at commit [`42230b6`](https://github.com/apache/spark/commit/42230b6e3edb731eb69b3b8800805805e2234d10).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r219936138
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,508 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends SqlBasedBenchmark {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmarkWithCodegen("agg w/o group", N) {
    +        spark.range(N).selectExpr("sum(id)").collect()
    +      }
         }
    -    /*
    -    agg w/o group:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    agg w/o group wholestage off                30136 / 31885         69.6          14.4       1.0X
    -    agg w/o group wholestage on                   1851 / 1860       1132.9           0.9      16.3X
    -     */
    -  }
     
    -  ignore("stat functions") {
    -    val N = 100L << 20
    +    runBenchmark("stat functions") {
    +      val N = 100L << 20
     
    -    runBenchmark("stddev", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
    -    }
    +      runBenchmarkWithCodegen("stddev", N) {
    +        spark.range(N).groupBy().agg("id" -> "stddev").collect()
    +      }
     
    -    runBenchmark("kurtosis", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      runBenchmarkWithCodegen("kurtosis", N) {
    +        spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      }
         }
     
    -    /*
    -    Using ImperativeAggregate (as implemented in Spark 1.6):
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------------
    -      stddev w/o codegen                      2019.04            10.39         1.00 X
    -      stddev w codegen                        2097.29            10.00         0.96 X
    -      kurtosis w/o codegen                    2108.99             9.94         0.96 X
    -      kurtosis w codegen                      2090.69            10.03         0.97 X
    -
    -      Using DeclarativeAggregate:
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      stddev codegen=false                     5630 / 5776         18.0          55.6       1.0X
    -      stddev codegen=true                      1259 / 1314         83.0          12.0       4.5X
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      kurtosis:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      kurtosis codegen=false                 14847 / 15084          7.0         142.9       1.0X
    -      kurtosis codegen=true                    1652 / 2124         63.0          15.9       9.0X
    -    */
    -  }
    +    runBenchmark("aggregate with linear keys") {
    +      val N = 20 << 22
     
    -  ignore("aggregate with linear keys") {
    -    val N = 20 << 22
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    def f(): Unit = {
    -      sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    -    }
    -
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      def f(): Unit = {
    +        spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    +        f()
    +      }
     
    -    benchmark.run()
    +      benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    +        f()
    +      }
     
    -    /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11
    -    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    +      benchmark.run()
    +    }
     
    -    Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    codegen = F                                   6619 / 6780         12.7          78.9       1.0X
    -    codegen = T hashmap = F                       3935 / 4059         21.3          46.9       1.7X
    -    codegen = T hashmap = T                        897 /  971         93.5          10.7       7.4X
    -    */
    -  }
    +    runBenchmark("aggregate with randomized keys") {
    +      val N = 20 << 22
     
    -  ignore("aggregate with randomized keys") {
    -    val N = 20 << 22
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
    +      spark.range(N).selectExpr("id", "floor(rand() * 10000) as k")
    +        .createOrReplaceTempView("test")
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    sparkSession.range(N).selectExpr("id", "floor(rand() * 10000) as k")
    -      .createOrReplaceTempView("test")
    +      def f(): Unit = spark.sql("select k, k, sum(id) from test group by k, k").collect()
     
    -    def f(): Unit = sparkSession.sql("select k, k, sum(id) from test group by k, k").collect()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", value = false)
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", value = true)
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", value = true)
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    +      benchmark.run()
         }
     
    -    benchmark.run()
    -
    -    /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11
    -    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    +    runBenchmark("aggregate with string key") {
    +      val N = 20 << 20
     
    -    Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    codegen = F                                   7445 / 7517         11.3          88.7       1.0X
    -    codegen = T hashmap = F                       4672 / 4703         18.0          55.7       1.6X
    -    codegen = T hashmap = T                       1764 / 1958         47.6          21.0       4.2X
    -    */
    -  }
    +      val benchmark = new Benchmark("Aggregate w string key", N, output = output)
     
    -  ignore("aggregate with string key") {
    -    val N = 20 << 20
    +      def f(): Unit = spark.range(N).selectExpr("id", "cast(id & 1023 as string) as k")
    +        .groupBy("k").count().collect()
     
    -    val benchmark = new Benchmark("Aggregate w string key", N)
    -    def f(): Unit = sparkSession.range(N).selectExpr("id", "cast(id & 1023 as string) as k")
    -      .groupBy("k").count().collect()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    +      benchmark.run()
         }
     
    -    benchmark.run()
    -
    -    /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4
    -    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    -    Aggregate w string key:             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -------------------------------------------------------------------------------------------
    -    codegen = F                              3307 / 3376          6.3         157.7       1.0X
    -    codegen = T hashmap = F                  2364 / 2471          8.9         112.7       1.4X
    -    codegen = T hashmap = T                  1740 / 1841         12.0          83.0       1.9X
    -    */
    -  }
    +    runBenchmark("aggregate with decimal key") {
    +      val N = 20 << 20
     
    -  ignore("aggregate with decimal key") {
    -    val N = 20 << 20
    +      val benchmark = new Benchmark("Aggregate w decimal key", N, output = output)
     
    -    val benchmark = new Benchmark("Aggregate w decimal key", N)
    -    def f(): Unit = sparkSession.range(N).selectExpr("id", "cast(id & 65535 as decimal) as k")
    -      .groupBy("k").count().collect()
    +      def f(): Unit = spark.range(N).selectExpr("id", "cast(id & 65535 as decimal) as k")
    +        .groupBy("k").count().collect()
     
    -    benchmark.addCase(s"codegen = F") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    -
    -    benchmark.addCase(s"codegen = T hashmap = F") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    -
    -    benchmark.addCase(s"codegen = T hashmap = T") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    -    }
    -
    -    benchmark.run()
    -
    -    /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4
    -    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    -    Aggregate w decimal key:             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -------------------------------------------------------------------------------------------
    -    codegen = F                              2756 / 2817          7.6         131.4       1.0X
    -    codegen = T hashmap = F                  1580 / 1647         13.3          75.4       1.7X
    -    codegen = T hashmap = T                   641 /  662         32.7          30.6       4.3X
    -    */
    -  }
    +      benchmark.addCase(s"codegen = F") { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    +        f()
    +      }
     
    -  ignore("aggregate with multiple key types") {
    -    val N = 20 << 20
    -
    -    val benchmark = new Benchmark("Aggregate w multiple keys", N)
    -    def f(): Unit = sparkSession.range(N)
    -      .selectExpr(
    -        "id",
    -        "(id & 1023) as k1",
    -        "cast(id & 1023 as string) as k2",
    -        "cast(id & 1023 as int) as k3",
    -        "cast(id & 1023 as double) as k4",
    -        "cast(id & 1023 as float) as k5",
    -        "id > 1023 as k6")
    -      .groupBy("k1", "k2", "k3", "k4", "k5", "k6")
    -      .sum()
    -      .collect()
    -
    -    benchmark.addCase(s"codegen = F") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = F") { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = T") { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    +        f()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = T") { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    +      benchmark.run()
         }
     
    -    benchmark.run()
    -
    -    /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4
    -    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    -    Aggregate w decimal key:             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    -------------------------------------------------------------------------------------------
    -    codegen = F                              5885 / 6091          3.6         280.6       1.0X
    -    codegen = T hashmap = F                  3625 / 4009          5.8         172.8       1.6X
    -    codegen = T hashmap = T                  3204 / 3271          6.5         152.8       1.8X
    -    */
    -  }
    +    runBenchmark("aggregate with multiple key types") {
    +      val N = 20 << 20
     
    -  ignore("max function bytecode size of wholestagecodegen") {
    -    val N = 20 << 15
    -
    -    val benchmark = new Benchmark("max function bytecode size", N)
    -    def f(): Unit = sparkSession.range(N)
    -      .selectExpr(
    -        "id",
    -        "(id & 1023) as k1",
    -        "cast(id & 1023 as double) as k2",
    -        "cast(id & 1023 as int) as k3",
    -        "case when id > 100 and id <= 200 then 1 else 0 end as v1",
    -        "case when id > 200 and id <= 300 then 1 else 0 end as v2",
    -        "case when id > 300 and id <= 400 then 1 else 0 end as v3",
    -        "case when id > 400 and id <= 500 then 1 else 0 end as v4",
    -        "case when id > 500 and id <= 600 then 1 else 0 end as v5",
    -        "case when id > 600 and id <= 700 then 1 else 0 end as v6",
    -        "case when id > 700 and id <= 800 then 1 else 0 end as v7",
    -        "case when id > 800 and id <= 900 then 1 else 0 end as v8",
    -        "case when id > 900 and id <= 1000 then 1 else 0 end as v9",
    -        "case when id > 1000 and id <= 1100 then 1 else 0 end as v10",
    -        "case when id > 1100 and id <= 1200 then 1 else 0 end as v11",
    -        "case when id > 1200 and id <= 1300 then 1 else 0 end as v12",
    -        "case when id > 1300 and id <= 1400 then 1 else 0 end as v13",
    -        "case when id > 1400 and id <= 1500 then 1 else 0 end as v14",
    -        "case when id > 1500 and id <= 1600 then 1 else 0 end as v15",
    -        "case when id > 1600 and id <= 1700 then 1 else 0 end as v16",
    -        "case when id > 1700 and id <= 1800 then 1 else 0 end as v17",
    -        "case when id > 1800 and id <= 1900 then 1 else 0 end as v18")
    -      .groupBy("k1", "k2", "k3")
    -      .sum()
    -      .collect()
    -
    -    benchmark.addCase("codegen = F") { iter =>
    -      sparkSession.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")
    -      f()
    -    }
    +      val benchmark = new Benchmark("Aggregate w multiple keys", N, output = output)
     
    -    benchmark.addCase("codegen = T hugeMethodLimit = 10000") { iter =>
    -      sparkSession.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
    -      sparkSession.conf.set(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key, "10000")
    -      f()
    -    }
    +      def f(): Unit = spark.range(N)
    +        .selectExpr(
    +          "id",
    +          "(id & 1023) as k1",
    +          "cast(id & 1023 as string) as k2",
    +          "cast(id & 1023 as int) as k3",
    +          "cast(id & 1023 as double) as k4",
    +          "cast(id & 1023 as float) as k5",
    +          "id > 1023 as k6")
    +        .groupBy("k1", "k2", "k3", "k4", "k5", "k6")
    +        .sum()
    +        .collect()
     
    -    benchmark.addCase("codegen = T hugeMethodLimit = 1500") { iter =>
    -      sparkSession.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
    -      sparkSession.conf.set(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key, "1500")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = F") { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    +        f()
    +      }
     
    -    benchmark.run()
    +      benchmark.addCase(s"codegen = T hashmap = F") { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    +        f()
    +      }
     
    -    /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 on Mac OS X 10.10.2
    -    Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
    +      benchmark.addCase(s"codegen = T hashmap = T") { iter =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    +        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    +        f()
    +      }
     
    -    max function bytecode size:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    codegen = F                                    709 /  803          0.9        1082.1       1.0X
    -    codegen = T hugeMethodLimit = 10000           3485 / 3548          0.2        5317.7       0.2X
    -    codegen = T hugeMethodLimit = 1500             636 /  701          1.0         969.9       1.1X
    -     */
    -  }
    +      benchmark.run()
    +    }
    +
    +    runBenchmark("max function bytecode size of wholestagecodegen") {
    +      val N = 20 << 15
    +
    +      val benchmark = new Benchmark("max function bytecode size", N, output = output)
    +
    +      def f(): Unit = spark.range(N)
    +        .selectExpr(
    +          "id",
    +          "(id & 1023) as k1",
    +          "cast(id & 1023 as double) as k2",
    +          "cast(id & 1023 as int) as k3",
    +          "case when id > 100 and id <= 200 then 1 else 0 end as v1",
    +          "case when id > 200 and id <= 300 then 1 else 0 end as v2",
    +          "case when id > 300 and id <= 400 then 1 else 0 end as v3",
    +          "case when id > 400 and id <= 500 then 1 else 0 end as v4",
    +          "case when id > 500 and id <= 600 then 1 else 0 end as v5",
    +          "case when id > 600 and id <= 700 then 1 else 0 end as v6",
    +          "case when id > 700 and id <= 800 then 1 else 0 end as v7",
    +          "case when id > 800 and id <= 900 then 1 else 0 end as v8",
    +          "case when id > 900 and id <= 1000 then 1 else 0 end as v9",
    +          "case when id > 1000 and id <= 1100 then 1 else 0 end as v10",
    +          "case when id > 1100 and id <= 1200 then 1 else 0 end as v11",
    +          "case when id > 1200 and id <= 1300 then 1 else 0 end as v12",
    +          "case when id > 1300 and id <= 1400 then 1 else 0 end as v13",
    +          "case when id > 1400 and id <= 1500 then 1 else 0 end as v14",
    +          "case when id > 1500 and id <= 1600 then 1 else 0 end as v15",
    +          "case when id > 1600 and id <= 1700 then 1 else 0 end as v16",
    +          "case when id > 1700 and id <= 1800 then 1 else 0 end as v17",
    +          "case when id > 1800 and id <= 1900 then 1 else 0 end as v18")
    +        .groupBy("k1", "k2", "k3")
    +        .sum()
    +        .collect()
    +
    +      benchmark.addCase("codegen = F") { iter =>
    +        spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "false")
    +        f()
    +      }
     
    +      benchmark.addCase("codegen = T hugeMethodLimit = 10000") { iter =>
    +        spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
    +        spark.conf.set(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key, "10000")
    +        f()
    +      }
     
    -  ignore("cube") {
    -    val N = 5 << 20
    +      benchmark.addCase("codegen = T hugeMethodLimit = 1500") { iter =>
    +        spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
    +        spark.conf.set(SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.key, "1500")
    --- End diff --
    
    Although this is not a problem of this refactoring, this test suite seems to be unhealthy because the configuration from the previous benchmark is propagated to the next benchmark.
    
    Can we fix this test suite to use `withSQLConf`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3418/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220799981
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends SqlBasedBenchmark {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmarkWithCodegen("agg w/o group", N) {
    +        spark.range(N).selectExpr("sum(id)").collect()
    +      }
         }
    -    /*
    -    agg w/o group:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    agg w/o group wholestage off                30136 / 31885         69.6          14.4       1.0X
    -    agg w/o group wholestage on                   1851 / 1860       1132.9           0.9      16.3X
    -     */
    -  }
     
    -  ignore("stat functions") {
    -    val N = 100L << 20
    +    runBenchmark("stat functions") {
    +      val N = 100L << 20
     
    -    runBenchmark("stddev", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
    -    }
    +      runBenchmarkWithCodegen("stddev", N) {
    +        spark.range(N).groupBy().agg("id" -> "stddev").collect()
    +      }
     
    -    runBenchmark("kurtosis", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      runBenchmarkWithCodegen("kurtosis", N) {
    +        spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      }
         }
     
    -    /*
    -    Using ImperativeAggregate (as implemented in Spark 1.6):
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------------
    -      stddev w/o codegen                      2019.04            10.39         1.00 X
    -      stddev w codegen                        2097.29            10.00         0.96 X
    -      kurtosis w/o codegen                    2108.99             9.94         0.96 X
    -      kurtosis w codegen                      2090.69            10.03         0.97 X
    -
    -      Using DeclarativeAggregate:
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      stddev codegen=false                     5630 / 5776         18.0          55.6       1.0X
    -      stddev codegen=true                      1259 / 1314         83.0          12.0       4.5X
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      kurtosis:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      kurtosis codegen=false                 14847 / 15084          7.0         142.9       1.0X
    -      kurtosis codegen=true                    1652 / 2124         63.0          15.9       9.0X
    -    */
    -  }
    -
    -  ignore("aggregate with linear keys") {
    -    val N = 20 << 22
    +    runBenchmark("aggregate with linear keys") {
    +      val N = 20 << 22
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    def f(): Unit = {
    -      sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    -    }
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      def f(): Unit = {
    +        spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    +          f()
    +        }
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(
    +          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
    +          f()
    +        }
    +      }
     
    -    benchmark.run()
    +      benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { _ =>
    +        withSQLConf(
    +          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "true",
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> "true") {
    +          f()
    +        }
    +      }
     
    -    /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11
    -    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    +      benchmark.run()
    +    }
     
    -    Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    codegen = F                                   6619 / 6780         12.7          78.9       1.0X
    -    codegen = T hashmap = F                       3935 / 4059         21.3          46.9       1.7X
    -    codegen = T hashmap = T                        897 /  971         93.5          10.7       7.4X
    -    */
    -  }
    +    runBenchmark("aggregate with randomized keys") {
    +      val N = 20 << 22
     
    -  ignore("aggregate with randomized keys") {
    -    val N = 20 << 22
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
    +      spark.range(N).selectExpr("id", "floor(rand() * 10000) as k")
    +        .createOrReplaceTempView("test")
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    sparkSession.range(N).selectExpr("id", "floor(rand() * 10000) as k")
    -      .createOrReplaceTempView("test")
    +      def f(): Unit = spark.sql("select k, k, sum(id) from test group by k, k").collect()
     
    -    def f(): Unit = sparkSession.sql("select k, k, sum(id) from test group by k, k").collect()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    +          f()
    +        }
    +      }
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(
    +          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
    +          f()
    +        }
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { _ =>
    +        withSQLConf(
    +          SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "true",
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> "true") {
    +          f()
    +        }
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    +      benchmark.run()
         }
     
    -    benchmark.run()
    -
    -    /*
    -    Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11
    -    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    +    runBenchmark("aggregate with string key") {
    +      val N = 20 << 20
     
    -    Aggregate w keys:                        Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    codegen = F                                   7445 / 7517         11.3          88.7       1.0X
    -    codegen = T hashmap = F                       4672 / 4703         18.0          55.7       1.6X
    -    codegen = T hashmap = T                       1764 / 1958         47.6          21.0       4.2X
    -    */
    -  }
    +      val benchmark = new Benchmark("Aggregate w string key", N, output = output)
     
    -  ignore("aggregate with string key") {
    -    val N = 20 << 20
    +      def f(): Unit = spark.range(N).selectExpr("id", "cast(id & 1023 as string) as k")
    +        .groupBy("k").count().collect()
     
    -    val benchmark = new Benchmark("Aggregate w string key", N)
    -    def f(): Unit = sparkSession.range(N).selectExpr("id", "cast(id & 1023 as string) as k")
    -      .groupBy("k").count().collect()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    --- End diff --
    
    Shall we remove this redundant line 148?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96616/testReport)** for PR 22484 at commit [`0fae54d`](https://github.com/apache/spark/commit/0fae54d9e1a40c021d0ebb5af3d9d969431807f7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96529/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96616/testReport)** for PR 22484 at commit [`0fae54d`](https://github.com/apache/spark/commit/0fae54d9e1a40c021d0ebb5af3d9d969431807f7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220026018
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    --- End diff --
    
    When we use `Seq(true, false).foreach { value =>`, we usually do `s"$value"`. But, for this, I think `"false"` is the simplest and the best.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r219104743
  
    --- Diff: sql/core/benchmarks/AggregateBenchmark-results.txt ---
    @@ -0,0 +1,154 @@
    +================================================================================================
    +aggregate without grouping
    +================================================================================================
    +
    +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    +
    +agg w/o group:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    +------------------------------------------------------------------------------------------------
    +agg w/o group wholestage off                39650 / 46049         52.9          18.9       1.0X
    +agg w/o group wholestage on                   1224 / 1413       1713.5           0.6      32.4X
    +
    +
    +================================================================================================
    +stat functions
    +================================================================================================
    +
    --- End diff --
    
    @davies Do you know how to generate there benchmark:
    https://github.com/apache/spark/blob/3c3eebc8734e36e61f4627e2c517fbbe342b3b42/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala#L70-L78


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220959695
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends SqlBasedBenchmark {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmarkWithCodegen("agg w/o group", N) {
    +        spark.range(N).selectExpr("sum(id)").collect()
    +      }
         }
    -    /*
    -    agg w/o group:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    agg w/o group wholestage off                30136 / 31885         69.6          14.4       1.0X
    -    agg w/o group wholestage on                   1851 / 1860       1132.9           0.9      16.3X
    -     */
    -  }
     
    -  ignore("stat functions") {
    -    val N = 100L << 20
    +    runBenchmark("stat functions") {
    +      val N = 100L << 20
     
    -    runBenchmark("stddev", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
    -    }
    +      runBenchmarkWithCodegen("stddev", N) {
    +        spark.range(N).groupBy().agg("id" -> "stddev").collect()
    +      }
     
    -    runBenchmark("kurtosis", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      runBenchmarkWithCodegen("kurtosis", N) {
    +        spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      }
         }
     
    -    /*
    -    Using ImperativeAggregate (as implemented in Spark 1.6):
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------------
    -      stddev w/o codegen                      2019.04            10.39         1.00 X
    -      stddev w codegen                        2097.29            10.00         0.96 X
    -      kurtosis w/o codegen                    2108.99             9.94         0.96 X
    -      kurtosis w codegen                      2090.69            10.03         0.97 X
    -
    -      Using DeclarativeAggregate:
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      stddev codegen=false                     5630 / 5776         18.0          55.6       1.0X
    -      stddev codegen=true                      1259 / 1314         83.0          12.0       4.5X
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      kurtosis:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      kurtosis codegen=false                 14847 / 15084          7.0         142.9       1.0X
    -      kurtosis codegen=true                    1652 / 2124         63.0          15.9       9.0X
    -    */
    -  }
    -
    -  ignore("aggregate with linear keys") {
    -    val N = 20 << 22
    +    runBenchmark("aggregate with linear keys") {
    +      val N = 20 << 22
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    def f(): Unit = {
    -      sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    -    }
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      def f(): Unit = {
    +        spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    --- End diff --
    
    Thanks @dongjoon-hyun I plan add `EmptyInterpolatedStringChecker` to scalastyle-config.xml to avoid this issue: [SPARK-25553](https://issues.apache.org/jira/browse/SPARK-25553)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221396456
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    @wangyum and @gengliangwang .
    
    What is the future plan for the usage of both `SqlBasedBenchmark` and `BenchmarkWithCodegen`? I'm wondering what is the criteria to choose each trait.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96529/testReport)** for PR 22484 at commit [`536d4e9`](https://github.com/apache/spark/commit/536d4e9bb0cc737ee2f7c887763a4c9ebef147f5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220764252
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{AnalysisException, SparkSession}
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL
    +   * configurations.
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    --- End diff --
    
    @dongjoon-hyun  I finished it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96667/testReport)** for PR 22484 at commit [`3439992`](https://github.com/apache/spark/commit/3439992a4bf1ef2ec6f3a083bf0339694216b916).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220025726
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -104,23 +107,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
     
           def f(): Unit = spark.sql("select k, k, sum(id) from test group by k, k").collect()
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", value = false)
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", value = true)
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) {
    --- End diff --
    
    ditto.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220043531
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{AnalysisException, SparkSession}
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL
    +   * configurations.
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    --- End diff --
    
    Shall we avoid duplicating the existing logic `withSQLConf`? Let me try to fix.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Thank you for updating, @wangyum .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3478/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220033439
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) {
    --- End diff --
    
    Yes!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3403/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96811/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96531/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96338/testReport)** for PR 22484 at commit [`649f296`](https://github.com/apache/spark/commit/649f2965188efcfa0b1d2b5acb4c0f057ecd3788).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3290/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r219725643
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait for micro benchmarks that are supposed to run standalone (i.e. not together
    + * with other test suites).
    + */
    +trait RunBenchmarkWithCodegen extends BenchmarkBase {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /** Runs function `f` with whole stage codegen on and off. */
    +  def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
    --- End diff --
    
    How about `runBenchmark` -> `runSqlBaseBenchmark `?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96811/testReport)** for PR 22484 at commit [`6c46ad5`](https://github.com/apache/spark/commit/6c46ad59c063fa6283fb23046300404767a82248).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96686/testReport)** for PR 22484 at commit [`3439992`](https://github.com/apache/spark/commit/3439992a4bf1ef2ec6f3a083bf0339694216b916).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3601/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    @wangyum . If you don't mind, could you review my PR to your branch, https://github.com/wangyum/spark/pull/11, which
    - deduplicates `withSQLConf` back
    - runs on AWS EC2 `r3.xlarge`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220025291
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    --- End diff --
    
    `"false"` instead of `false.toString`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3417/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96686/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    I made an official PR for `withSQLConf` [SPARK-25534](https://github.com/apache/spark/pull/22548) which reduces the number of `withSQLConf` from 6 to 3. The three were the minimal different implementations.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96616/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220025674
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -104,23 +107,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
     
           def f(): Unit = spark.sql("select k, k, sum(id) from test group by k, k").collect()
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", value = false)
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    --- End diff --
    
    `"false"` instead of `false.toString`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    @dongjoon-hyun Other refactorings are waiting for this commit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96530/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r219683925
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,508 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends RunBenchmarkWithCodegen {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmark("agg w/o group", N) {
    --- End diff --
    
    Yes. Do you have a suggested name?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221397904
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    I think we can remove `BenchmarkWithCodegen` after all refactor finished.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96502/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96531/testReport)** for PR 22484 at commit [`d5fecdc`](https://github.com/apache/spark/commit/d5fecdcca267cb34c3d9aaa72d8ea4a3a7a3cc60).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Yep. This PR already is introducing new `trait` for `Benchmark` as a part of grand refactoring plan. I think the other new `trait`s also possible at this time. We will see. :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220027149
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) {
    --- End diff --
    
    @wangyum . This one is also for indentation. Please note that `withSQLConf(` is beyond the first configuration.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96624/testReport)** for PR 22484 at commit [`0fae54d`](https://github.com/apache/spark/commit/0fae54d9e1a40c021d0ebb5af3d9d969431807f7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220303975
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{AnalysisException, SparkSession}
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL
    +   * configurations.
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val conf = SQLConf.get
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map { key =>
    +      if (conf.contains(key)) {
    +        Some(conf.getConfString(key))
    +      } else {
    +        None
    +      }
    +    }
    +    (keys, values).zipped.foreach { (k, v) =>
    +      if (SQLConf.staticConfKeys.contains(k)) {
    +        throw new AnalysisException(s"Cannot modify the value of a static config: $k")
    +      }
    +      conf.setConfString(k, v)
    +    }
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => conf.setConfString(key, value)
    +        case (key, None) => conf.unsetConf(key)
    +      }
    +    }
    +  }
    +
    +  /** Runs function `f` with whole stage codegen on and off. */
    +  def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => Unit): Unit = {
    +    val benchmark = new Benchmark(name, cardinality, output = output)
    +
    +    benchmark.addCase(s"$name wholestage off", numIters = 2) { _ =>
    +      withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    --- End diff --
    
    Right. It's the same. And, the previous issue was we don't call `unset`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221434117
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    Thank you, @gengliangwang and @wangyum . Let me think about this again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96531/testReport)** for PR 22484 at commit [`d5fecdc`](https://github.com/apache/spark/commit/d5fecdcca267cb34c3d9aaa72d8ea4a3a7a3cc60).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3401/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    BTW, I don't think we need `withTempTable` and `withTempPath` in this PR. Those are beyond of the scope of this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220025591
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) {
    --- End diff --
    
    ```scala
    withSQLConf(
      SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
      SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
      "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220435538
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{AnalysisException, SparkSession}
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL
    +   * configurations.
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    --- End diff --
    
    @wangyum . Thank you for waiting.
    Since SPARK-25534 is merged, could you use `SQLHelper.withSQLConf` instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220438521
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{AnalysisException, SparkSession}
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL
    +   * configurations.
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    --- End diff --
    
    Yes, I will do it now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22484


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96500/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    LGTM, cc @dongjoon-hyun for sign-off


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96667/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220298090
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{AnalysisException, SparkSession}
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL
    +   * configurations.
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val conf = SQLConf.get
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map { key =>
    +      if (conf.contains(key)) {
    +        Some(conf.getConfString(key))
    +      } else {
    +        None
    +      }
    +    }
    +    (keys, values).zipped.foreach { (k, v) =>
    +      if (SQLConf.staticConfKeys.contains(k)) {
    +        throw new AnalysisException(s"Cannot modify the value of a static config: $k")
    +      }
    +      conf.setConfString(k, v)
    +    }
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => conf.setConfString(key, value)
    +        case (key, None) => conf.unsetConf(key)
    +      }
    +    }
    +  }
    +
    +  /** Runs function `f` with whole stage codegen on and off. */
    +  def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => Unit): Unit = {
    +    val benchmark = new Benchmark(name, cardinality, output = output)
    +
    +    benchmark.addCase(s"$name wholestage off", numIters = 2) { _ =>
    +      withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    --- End diff --
    
    for this particular case, do we really need `withSQLConf`? I think we can do
    ```
    val conf = SQLConf.get
    conf.set(WHOLESTAGE_CODEGEN_ENABLED, false)
    run benchmark...
    conf.set(WHOLESTAGE_CODEGEN_ENABLED, true)
    run benchmark...
    conf.unset(WHOLESTAGE_CODEGEN_ENABLED)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96666/testReport)** for PR 22484 at commit [`45add92`](https://github.com/apache/spark/commit/45add92f824c8d679086df13380f9fd27d6533d3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221415701
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    @dongjoon-hyun in https://github.com/apache/spark/pull/22522 I feel that it would be better to have a example refactoring, thus we can see how the new trait is used. 
    We can move back to https://github.com/apache/spark/pull/22522 . I am OK either way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96667/testReport)** for PR 22484 at commit [`3439992`](https://github.com/apache/spark/commit/3439992a4bf1ef2ec6f3a083bf0339694216b916).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96529/testReport)** for PR 22484 at commit [`536d4e9`](https://github.com/apache/spark/commit/536d4e9bb0cc737ee2f7c887763a4c9ebef147f5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96338/testReport)** for PR 22484 at commit [`649f296`](https://github.com/apache/spark/commit/649f2965188efcfa0b1d2b5acb4c0f057ecd3788).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96686/testReport)** for PR 22484 at commit [`3439992`](https://github.com/apache/spark/commit/3439992a4bf1ef2ec6f3a083bf0339694216b916).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220034493
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) {
    --- End diff --
    
    Fixed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r219706455
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,508 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends RunBenchmarkWithCodegen {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmark("agg w/o group", N) {
    --- End diff --
    
    Well I don't a good name in mind. How about make the method `runBenchmark` of `RunBenchmarkWithCodegen` overriding the one in `BenchmarkBase`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3516/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96502/testReport)** for PR 22484 at commit [`2d778a4`](https://github.com/apache/spark/commit/2d778a4c8fb5d3b373856837496882c05ff1d42d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait SqlBasedBenchmark extends BenchmarkBase `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220304213
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.{AnalysisException, SparkSession}
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /**
    +   * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL
    +   * configurations.
    +   */
    +  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
    +    val conf = SQLConf.get
    +    val (keys, values) = pairs.unzip
    +    val currentValues = keys.map { key =>
    +      if (conf.contains(key)) {
    +        Some(conf.getConfString(key))
    +      } else {
    +        None
    +      }
    +    }
    +    (keys, values).zipped.foreach { (k, v) =>
    +      if (SQLConf.staticConfKeys.contains(k)) {
    +        throw new AnalysisException(s"Cannot modify the value of a static config: $k")
    +      }
    +      conf.setConfString(k, v)
    +    }
    +    try f finally {
    +      keys.zip(currentValues).foreach {
    +        case (key, Some(value)) => conf.setConfString(key, value)
    +        case (key, None) => conf.unsetConf(key)
    +      }
    +    }
    +  }
    +
    +  /** Runs function `f` with whole stage codegen on and off. */
    +  def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => Unit): Unit = {
    +    val benchmark = new Benchmark(name, cardinality, output = output)
    +
    +    benchmark.addCase(s"$name wholestage off", numIters = 2) { _ =>
    +      withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    --- End diff --
    
    Is it too much to introduce a trait for `withSQLConf`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221501890
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    +
    +  val spark: SparkSession = getSparkSession
    --- End diff --
    
    `val spark` -> `protected val spark`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3380/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96338/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r219676161
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,508 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends RunBenchmarkWithCodegen {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmark("agg w/o group", N) {
    --- End diff --
    
    The `runBenchmark` here is different from the on in line 48, but they have the same name. We should have a different name.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3483/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221401306
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    So, if @gengliangwang agree with that, `SqlBasedBenchmark` is another refactoring (renaming and improvement) like `[SPARK-25499][TEST] Refactor BenchmarkBase and Benchmark`. Could you do that in a separate PR in advance?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96530/testReport)** for PR 22484 at commit [`f783099`](https://github.com/apache/spark/commit/f783099f818d3ed509ab95133ff46e2f183444a8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Thanks @dongjoon-hyun. except `withSQLConf`. We need `withTempTable` and `withTempPath`:
    https://github.com/apache/spark/blob/d25f425c9652a3611dd5fea8a37df4abb13e126e/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala#L63-L83


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96500/testReport)** for PR 22484 at commit [`2d778a4`](https://github.com/apache/spark/commit/2d778a4c8fb5d3b373856837496882c05ff1d42d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96502/testReport)** for PR 22484 at commit [`2d778a4`](https://github.com/apache/spark/commit/2d778a4c8fb5d3b373856837496882c05ff1d42d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Also, cc @gatorsmile . Since we need `withSQLConf` in both Benchmark and TestSuite, I want to introduce `SupportWithSQLConf` trait .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    @wangyum . Could you make the title and description up-to-date for this PR content? Also, please update JIRA title and description, too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96473/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96666/testReport)** for PR 22484 at commit [`45add92`](https://github.com/apache/spark/commit/45add92f824c8d679086df13380f9fd27d6533d3).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org