You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/01/16 10:21:54 UTC

[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/10786

    [SPARK-10735] [SQL] Generate aggregation w/o grouping keys [WIP]

    PR is based on #10735 
    
    The benchmark show that generated aggregation could be 6X time faster than non-generated (with generated filter/range).
    
    It also showed that declarative function is much faster than imperative one (5X faster for stddev), we may need to switch to a declarative version. (the difference need to be measured when there are grouping keys).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark gen_agg2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10786
    
----
commit 2da493f06d8b9b9c0e5cf641a84fb35d5a3a7ad8
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-11T23:01:05Z

    whole stage codegen

commit 998b6a150bdc2e76cea973eddedaf6ab98161637
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-12T21:30:01Z

    support range

commit 218412fbacf37a9c9621c277c082f45a25a9fc5e
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-12T22:11:20Z

    add benchmark

commit 158cb36bd74ba11ec7dd87adaa5f1c40e8931da8
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-12T22:15:47Z

    remove println and test

commit 88c51a6aa09cb4b7abc0ec6d7609d9c823a07460
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-12T22:37:20Z

    fix bug

commit b76c00e2b75a869e7ea3d6a7fda28b4bdfa85be1
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-12T23:38:38Z

    fix tests

commit 4e2ee8889c18de75f29243ced2b8b52b1f0a5b1b
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-13T00:44:59Z

    clean the interface

commit f05524c2f9aba8c422f571ba6e6a7d8b02a661d8
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-13T01:07:08Z

    update test

commit 43139a84e637598953f0c757ba2ea62f4ad4bfe8
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-13T23:13:30Z

    fix style, improve comments

commit 73fe07499faed9fdb33192337b0996a299d8e6bf
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-13T23:32:48Z

    fix style

commit 32309e14e39af9f418faa3d837cebe64c3c87b84
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-14T06:45:09Z

    fix test

commit 38029bc5eff03cd4a7334bbe7c16b543f7247030
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T07:29:00Z

    WholeStageCodegen/FakeInput as adapter

commit 1f9ddb8538ed5c2017e6310815eef32f065ca0cb
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T07:54:26Z

    address comments

commit 2a3f0e32a81898da1399ad1d0b98ec913dbc5668
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T08:00:08Z

    Merge branch 'master' of github.com:apache/spark into whole2
    
    Conflicts:
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
    	sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

commit c1dd60a21e1a016b5479cf89cdef035ac53167f0
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T08:05:27Z

    fix style

commit 16ce50c4458b7ab4bd4c28e88d6f7f600bbeebf7
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T16:28:44Z

    Merge branch 'master' of github.com:apache/spark into whole2
    
    Conflicts:
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

commit d4530812c29ea15c92a07cc9aae3db05a9d8e06a
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T16:39:20Z

    fix bug

commit 34a0a6f15c48935bc79afd7b08e5553552690adc
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T17:39:51Z

    fix test

commit 908c8cb22596e0ae662d388014624eb289c753d5
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T18:11:15Z

    address comments

commit 1feab20decb487ed66e712323565bb4d58f7637f
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T19:24:58Z

    use sparkPlan for checking

commit c9741ea856ecccdcb2dc349ee576080b32b50e19
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T21:52:41Z

    create a rule for whole stage codegen

commit 7c0570335da47847c3462306a930abc91ee1f932
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T22:02:07Z

    renaming

commit 0b401067347f57d0f7b92c999bb2bcf7e3e95d3d
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-15T22:39:05Z

    fix style

commit d12a8da88e37344bc429fb4f67c0258a28b209c4
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-16T03:21:00Z

    Merge branch 'master' of github.com:apache/spark into whole2
    
    Conflicts:
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodegenFallback.scala
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratePredicate.scala
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateSafeProjection.scala
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
    	sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala

commit 3df7e5dfad96d48eb47d6f81d85457ae6d3a4156
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-16T06:19:25Z

    generate aggregation

commit b43f26257885a912a45850cd5fad1936a2dbfe1b
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-16T06:37:56Z

    support impremitive function

commit 640a4b57a687e16ad02c22c4fe787a540ff890a8
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-16T09:13:29Z

    support imperative functions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172294221
  
    **[Test build #49540 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49540/consoleFull)** for PR 10786 at commit [`10f6bb9`](https://github.com/apache/spark/commit/10f6bb947073cd73c0038db86ba23a09cfb2f188).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172299073
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172250474
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49537/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172300223
  
    **[Test build #49548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49548/consoleFull)** for PR 10786 at commit [`c8ed5d9`](https://github.com/apache/spark/commit/c8ed5d9884a0b37587c1d6552dc31d56dad7cac5).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172299063
  
    **[Test build #49546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49546/consoleFull)** for PR 10786 at commit [`4c655dd`](https://github.com/apache/spark/commit/4c655dd45ec513c49bda8b827cac10407609a890).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172300233
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172293649
  
    **[Test build #49540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49540/consoleFull)** for PR 10786 at commit [`10f6bb9`](https://github.com/apache/spark/commit/10f6bb947073cd73c0038db86ba23a09cfb2f188).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172300234
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49548/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-174729286
  
    This PR will be replaced by multiple small PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172299253
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172299254
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49547/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172294229
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10786#discussion_r50157642
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -46,15 +47,64 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         /*
           Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      Single Int Column Scan:      Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------
    -      Without whole stage codegen       6725.52            31.18         1.00 X
    -      With whole stage codegen          2233.05            93.91         3.01 X
    +      Single Int Column Scan:            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      Without whole stage codegen             6585.36            31.85         1.00 X
    +      With whole stage codegen                 343.80           609.99        19.15 X
    +    */
    +    benchmark.run()
    +  }
    +
    +  def testImperitaveAggregation(values: Int): Unit = {
    +
    +    val benchmark = new Benchmark("aggregation", values)
    +
    +    benchmark.addCase("ImpAgg w/o whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev").collect()
    +    }
    +
    +    benchmark.addCase("DeclAgg w/o whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev1").collect()
    +    }
    +
    +    benchmark.addCase("ImpAgg w whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev").collect()
    +    }
    +
    +    benchmark.addCase("DeclAgg w whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev1").collect()
    +    }
    +
    +    /*
    +      Before optimizing CentralMomentAgg and generated mutable projection:
    +
    +      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    +      aggregation:                       Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      ImpAgg w/o whole stage codegen          9047.35            11.59         1.00 X
    +      DeclAgg w/o whole stage codegen         6507.27            16.11         1.39 X
    +      ImpAgg w whole stage codegen            6947.30            15.09         1.30 X
    +      DeclAgg w whole stage codegen           1376.74            76.16         6.57 X
    +
    +      After optimization:
    +
    +      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    +      aggregation:                       Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      ImpAgg w/o whole stage codegen          6159.03            17.03         1.00 X
    +      DeclAgg w/o whole stage codegen         5248.69            19.98         1.17 X
    +      ImpAgg w whole stage codegen            4202.30            24.95         1.47 X
    +      DeclAgg w whole stage codegen           1367.34            76.69         4.50 X
    --- End diff --
    
    Glad to see 5x speed-up! +1 on switching to declarative on `stddev`. But for `skewness` and `kurtosis`, we still need to benchmark the performance to decide because their expressions are more complex.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10786#discussion_r50157926
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -46,15 +47,64 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         /*
           Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      Single Int Column Scan:      Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------
    -      Without whole stage codegen       6725.52            31.18         1.00 X
    -      With whole stage codegen          2233.05            93.91         3.01 X
    +      Single Int Column Scan:            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      Without whole stage codegen             6585.36            31.85         1.00 X
    +      With whole stage codegen                 343.80           609.99        19.15 X
    +    */
    +    benchmark.run()
    +  }
    +
    +  def testImperitaveAggregation(values: Int): Unit = {
    +
    +    val benchmark = new Benchmark("aggregation", values)
    +
    +    benchmark.addCase("ImpAgg w/o whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev").collect()
    +    }
    +
    +    benchmark.addCase("DeclAgg w/o whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev1").collect()
    +    }
    +
    +    benchmark.addCase("ImpAgg w whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev").collect()
    +    }
    +
    +    benchmark.addCase("DeclAgg w whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev1").collect()
    +    }
    +
    +    /*
    +      Before optimizing CentralMomentAgg and generated mutable projection:
    +
    +      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    +      aggregation:                       Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      ImpAgg w/o whole stage codegen          9047.35            11.59         1.00 X
    +      DeclAgg w/o whole stage codegen         6507.27            16.11         1.39 X
    +      ImpAgg w whole stage codegen            6947.30            15.09         1.30 X
    +      DeclAgg w whole stage codegen           1376.74            76.16         6.57 X
    +
    +      After optimization:
    +
    +      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    +      aggregation:                       Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      ImpAgg w/o whole stage codegen          6159.03            17.03         1.00 X
    +      DeclAgg w/o whole stage codegen         5248.69            19.98         1.17 X
    +      ImpAgg w whole stage codegen            4202.30            24.95         1.47 X
    +      DeclAgg w whole stage codegen           1367.34            76.69         4.50 X
    --- End diff --
    
    I've always preferred declarative aggregate because its much easier to optimize (very cool the kind of speed ups you are getting!).  As such, I'd support having all of our built in functions done this way.
    
    @mengxr argues that its too confusing for users and that we should also support the imperative one.  How high cost is this for us?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172250463
  
    **[Test build #49537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49537/consoleFull)** for PR 10786 at commit [`6911a05`](https://github.com/apache/spark/commit/6911a05e951d6452995936a50b5949e40595e7a0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172177058
  
    **[Test build #49526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49526/consoleFull)** for PR 10786 at commit [`640a4b5`](https://github.com/apache/spark/commit/640a4b57a687e16ad02c22c4fe787a540ff890a8).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class StddevAgg(child: Expression) extends DeclarativeAggregate `
      * `case class StddevPop1(child: Expression) extends StddevAgg(child) `
      * `case class StddevSamp1(child: Expression) extends StddevAgg(child) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172298524
  
    **[Test build #49546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49546/consoleFull)** for PR 10786 at commit [`4c655dd`](https://github.com/apache/spark/commit/4c655dd45ec513c49bda8b827cac10407609a890).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172294230
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49540/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172248892
  
    **[Test build #49537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49537/consoleFull)** for PR 10786 at commit [`6911a05`](https://github.com/apache/spark/commit/6911a05e951d6452995936a50b5949e40595e7a0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172177061
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49526/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172177060
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172299479
  
    **[Test build #49548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49548/consoleFull)** for PR 10786 at commit [`c8ed5d9`](https://github.com/apache/spark/commit/c8ed5d9884a0b37587c1d6552dc31d56dad7cac5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172299074
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49546/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by davies <gi...@git.apache.org>.
Github user davies closed the pull request at:

    https://github.com/apache/spark/pull/10786


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172250473
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10786#issuecomment-172176868
  
    **[Test build #49526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49526/consoleFull)** for PR 10786 at commit [`640a4b5`](https://github.com/apache/spark/commit/640a4b57a687e16ad02c22c4fe787a540ff890a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10786#discussion_r50043531
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala ---
    @@ -79,3 +81,116 @@ case class StddevPop(
         }
       }
     }
    +
    +// Compute standard deviation based on online algorithm specified here:
    +// http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
    +abstract class StddevAgg(child: Expression) extends DeclarativeAggregate {
    --- End diff --
    
    This is included for benchmark, could be used to replace the imperative one (should be done in follow up PR or this one?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10735] [SQL] Generate aggregation w/o g...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10786#discussion_r50038294
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala ---
    @@ -46,15 +47,64 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite {
     
         /*
           Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    -      Single Int Column Scan:      Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    -      -------------------------------------------------------------------------
    -      Without whole stage codegen       6725.52            31.18         1.00 X
    -      With whole stage codegen          2233.05            93.91         3.01 X
    +      Single Int Column Scan:            Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      Without whole stage codegen             6585.36            31.85         1.00 X
    +      With whole stage codegen                 343.80           609.99        19.15 X
    +    */
    +    benchmark.run()
    +  }
    +
    +  def testImperitaveAggregation(values: Int): Unit = {
    +
    +    val benchmark = new Benchmark("aggregation", values)
    +
    +    benchmark.addCase("ImpAgg w/o whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev").collect()
    +    }
    +
    +    benchmark.addCase("DeclAgg w/o whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "false")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev1").collect()
    +    }
    +
    +    benchmark.addCase("ImpAgg w whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev").collect()
    +    }
    +
    +    benchmark.addCase("DeclAgg w whole stage codegen") { iter =>
    +      sqlContext.setConf("spark.sql.codegen.wholeStage", "true")
    +      sqlContext.range(values).groupBy().agg("id" -> "stddev1").collect()
    +    }
    +
    +    /*
    +      Before optimizing CentralMomentAgg and generated mutable projection:
    +
    +      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    +      aggregation:                       Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      ImpAgg w/o whole stage codegen          9047.35            11.59         1.00 X
    +      DeclAgg w/o whole stage codegen         6507.27            16.11         1.39 X
    +      ImpAgg w whole stage codegen            6947.30            15.09         1.30 X
    +      DeclAgg w whole stage codegen           1376.74            76.16         6.57 X
    +
    +      After optimization:
    +
    +      Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
    +      aggregation:                       Avg Time(ms)    Avg Rate(M/s)  Relative Rate
    +      -------------------------------------------------------------------------------
    +      ImpAgg w/o whole stage codegen          6159.03            17.03         1.00 X
    +      DeclAgg w/o whole stage codegen         5248.69            19.98         1.17 X
    +      ImpAgg w whole stage codegen            4202.30            24.95         1.47 X
    +      DeclAgg w whole stage codegen           1367.34            76.69         4.50 X
    --- End diff --
    
    Based on this benchmark, Declarative function is faster than imperative function, both in whole stage codegen or not. Should we switch to implement all builtin aggregate functions as declarative? cc @mengxr @rxin @marmbrus @rxin .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org