You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/09/20 07:34:10 UTC

[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22484

    [SPARK-25476][TEST] Refactor AggregateBenchmark to use main method

    ## What changes were proposed in this pull request?
    
    Refactor `AggregateBenchmark` to use main method.
    To gererate benchmark result:
    ```
    SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.AggregateBenchmark"
    ```
    
    
    ## How was this patch tested?
    
    manual tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25476

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22484
    
----
commit 649f2965188efcfa0b1d2b5acb4c0f057ecd3788
Author: Yuming Wang <yu...@...>
Date:   2018-09-20T07:23:46Z

    Refactor AggregateBenchmark

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.

Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220028846
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
             spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
           }
     
    -      benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "false")
    -        f()
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) {
    +          f()
    +        }
           }
     
    -      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -        spark.conf.set("spark.sql.codegen.wholeStage", "true")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -        spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -        f()
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString,
    +          SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
    +          "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) {
    --- End diff --
    
    Do you mean change
    ```scala
    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
      SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
      "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
      f()
    }
    ```
    to 
    ```scala
    withSQLConf(
      SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
      SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
      "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
      f()
    }
    ```
    ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96624/testReport)** for PR 22484 at commit [`0fae54d`](https://github.com/apache/spark/commit/0fae54d9e1a40c021d0ebb5af3d9d969431807f7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96530/testReport)** for PR 22484 at commit [`f783099`](https://github.com/apache/spark/commit/f783099f818d3ed509ab95133ff46e2f183444a8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96500/testReport)** for PR 22484 at commit [`2d778a4`](https://github.com/apache/spark/commit/2d778a4c8fb5d3b373856837496882c05ff1d42d).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `trait SqlBasedBenchmark extends BenchmarkBase `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220800365
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends SqlBasedBenchmark {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmarkWithCodegen("agg w/o group", N) {
    +        spark.range(N).selectExpr("sum(id)").collect()
    +      }
         }
    -    /*
    -    agg w/o group:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    agg w/o group wholestage off                30136 / 31885         69.6          14.4       1.0X
    -    agg w/o group wholestage on                   1851 / 1860       1132.9           0.9      16.3X
    -     */
    -  }
     
    -  ignore("stat functions") {
    -    val N = 100L << 20
    +    runBenchmark("stat functions") {
    +      val N = 100L << 20
     
    -    runBenchmark("stddev", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
    -    }
    +      runBenchmarkWithCodegen("stddev", N) {
    +        spark.range(N).groupBy().agg("id" -> "stddev").collect()
    +      }
     
    -    runBenchmark("kurtosis", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      runBenchmarkWithCodegen("kurtosis", N) {
    +        spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      }
         }
     
    -    /*
    -    Using ImperativeAggregate (as implemented in Spark 1.6):
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------------
    -      stddev w/o codegen                      2019.04            10.39         1.00 X
    -      stddev w codegen                        2097.29            10.00         0.96 X
    -      kurtosis w/o codegen                    2108.99             9.94         0.96 X
    -      kurtosis w codegen                      2090.69            10.03         0.97 X
    -
    -      Using DeclarativeAggregate:
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      stddev codegen=false                     5630 / 5776         18.0          55.6       1.0X
    -      stddev codegen=true                      1259 / 1314         83.0          12.0       4.5X
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      kurtosis:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      kurtosis codegen=false                 14847 / 15084          7.0         142.9       1.0X
    -      kurtosis codegen=true                    1652 / 2124         63.0          15.9       9.0X
    -    */
    -  }
    -
    -  ignore("aggregate with linear keys") {
    -    val N = 20 << 22
    +    runBenchmark("aggregate with linear keys") {
    +      val N = 20 << 22
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    def f(): Unit = {
    -      sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    -    }
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      def f(): Unit = {
    +        spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    --- End diff --
    
    `s"codegen = F"` -> `"codegen = F"`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221503808
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    +
    +  val spark: SparkSession = getSparkSession
    +
    +  /** Subclass can override this function to build their own SparkSession */
    +  def getSparkSession: SparkSession = {
    +    SparkSession.builder()
    +      .master("local[1]")
    +      .appName(this.getClass.getCanonicalName)
    +      .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
    +      .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
    +      .getOrCreate()
    +  }
    +
    +  /** Runs function `f` with whole stage codegen on and off. */
    --- End diff --
    
    Can we use `codegenBenchmark` instead? `runBenchmarkWithCodegen` looks like an extension of `runBenchmark`. It's more like `bitEncodingBenchmark` or `sortBenchmark`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96624/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by wangyum <gi...@git.apache.org>.

Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by gengliangwang <gi...@git.apache.org>.

Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221415642
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    Actually I don't think the the name `SqlBasedBenchmark` is not appropriate..From the naming we can't tell it is about benchmarking with/without whole codegen. I will try to come up with a better name. Or we can discuss in this thread.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    **[Test build #96811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96811/testReport)** for PR 22484 at commit [`6c46ad5`](https://github.com/apache/spark/commit/6c46ad59c063fa6283fb23046300404767a82248).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by gengliangwang <gi...@git.apache.org>.

Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221417293
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    Then each function can be in different trait...I don't think that `runBenchmarkWithCodegen` has much in common with `runBenchmarkWithParquetPushDown`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221501761
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    For the naming, let's keep the current one for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96666/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3517/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3399/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by wangyum <gi...@git.apache.org>.

Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r221416957
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.benchmark
    +
    +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
    +import org.apache.spark.sql.SparkSession
    +import org.apache.spark.sql.catalyst.plans.SQLHelper
    +import org.apache.spark.sql.internal.SQLConf
    +
    +/**
    + * Common base trait to run benchmark with the Dataset and DataFrame API.
    + */
    +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
    --- End diff --
    
    Maybe we can add more common functions in the future. e.g. `runBenchmarkWithCodegen`, `runBenchmarkWithParquetPushDown`, `runBenchmarkWithOrcPushDown`...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22484#discussion_r220800434
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala ---
    @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
     
     /**
      * Benchmark to measure performance for aggregate primitives.
    - * To run this:
    - *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
    - *
    - * Benchmarks in this file are skipped in normal builds.
    + * To run this benchmark:
    + * {{{
    + *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
    + *   2. build/sbt "sql/test:runMain <this class>"
    + *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
    + *      Results will be written to "benchmarks/AggregateBenchmark-results.txt".
    + * }}}
      */
    -class AggregateBenchmark extends BenchmarkWithCodegen {
    +object AggregateBenchmark extends SqlBasedBenchmark {
     
    -  ignore("aggregate without grouping") {
    -    val N = 500L << 22
    -    val benchmark = new Benchmark("agg without grouping", N)
    -    runBenchmark("agg w/o group", N) {
    -      sparkSession.range(N).selectExpr("sum(id)").collect()
    +  override def benchmark(): Unit = {
    +    runBenchmark("aggregate without grouping") {
    +      val N = 500L << 22
    +      runBenchmarkWithCodegen("agg w/o group", N) {
    +        spark.range(N).selectExpr("sum(id)").collect()
    +      }
         }
    -    /*
    -    agg w/o group:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -    ------------------------------------------------------------------------------------------------
    -    agg w/o group wholestage off                30136 / 31885         69.6          14.4       1.0X
    -    agg w/o group wholestage on                   1851 / 1860       1132.9           0.9      16.3X
    -     */
    -  }
     
    -  ignore("stat functions") {
    -    val N = 100L << 20
    +    runBenchmark("stat functions") {
    +      val N = 100L << 20
     
    -    runBenchmark("stddev", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
    -    }
    +      runBenchmarkWithCodegen("stddev", N) {
    +        spark.range(N).groupBy().agg("id" -> "stddev").collect()
    +      }
     
    -    runBenchmark("kurtosis", N) {
    -      sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      runBenchmarkWithCodegen("kurtosis", N) {
    +        spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
    +      }
         }
     
    -    /*
    -    Using ImperativeAggregate (as implemented in Spark 1.6):
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------------
    -      stddev w/o codegen                      2019.04            10.39         1.00 X
    -      stddev w codegen                        2097.29            10.00         0.96 X
    -      kurtosis w/o codegen                    2108.99             9.94         0.96 X
    -      kurtosis w codegen                      2090.69            10.03         0.97 X
    -
    -      Using DeclarativeAggregate:
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      stddev:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      stddev codegen=false                     5630 / 5776         18.0          55.6       1.0X
    -      stddev codegen=true                      1259 / 1314         83.0          12.0       4.5X
    -
    -      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      kurtosis:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    -      -------------------------------------------------------------------------------------------
    -      kurtosis codegen=false                 14847 / 15084          7.0         142.9       1.0X
    -      kurtosis codegen=true                    1652 / 2124         63.0          15.9       9.0X
    -    */
    -  }
    -
    -  ignore("aggregate with linear keys") {
    -    val N = 20 << 22
    +    runBenchmark("aggregate with linear keys") {
    +      val N = 20 << 22
     
    -    val benchmark = new Benchmark("Aggregate w keys", N)
    -    def f(): Unit = {
    -      sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    -    }
    +      val benchmark = new Benchmark("Aggregate w keys", N, output = output)
     
    -    benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
    -      f()
    -    }
    +      def f(): Unit = {
    +        spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
    +        withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
    +          f()
    +        }
    +      }
     
    -    benchmark.addCase(s"codegen = T hashmap = T", numIters = 5) { iter =>
    -      sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "true")
    -      sparkSession.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "true")
    -      f()
    -    }
    +      benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
    --- End diff --
    
    `s"codegen = T hashmap = F"` -> `"codegen = T hashmap = F"`
    
    Could you fix all instances like this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3416/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22484: [SPARK-25476][SPARK-25510][TEST] Refactor AggregateBench...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22484
  
    @wangyum Could you review and merge https://github.com/wangyum/spark/pull/12 , too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org