You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/01/09 23:20:08 UTC

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/10677

    [SPARK-12706] [SQL] grouping() and grouping_id()

    Grouping() returns a column is aggregated or not, grouping_id() returns the aggregation levels.
    
    grouping()/grouping_id() could be used with window function, but does not work in having/sort clause, will be fixed by another PR. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark grouping

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10677
    
----
commit bcb8d9eecb1b83ae0d62fb88fc5a74c0e3fa9a88
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-09T07:55:30Z

    add grouping() and grouping_id()

commit 0e8317db91e081a30beb22adfb8c3463adcc51ec
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-09T21:47:31Z

    make GROUPING__ID compatible with Hive

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182600576
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182526853
  
    @hvanhovell The analyzer know nothing about Hive, where is the best place to put the comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-170287981
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49049/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182231108
  
    **[Test build #51027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51027/consoleFull)** for PR 10677 at commit [`c008569`](https://github.com/apache/spark/commit/c008569bdc28b5e5bdb4149b7fb557807132bb76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182252701
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-170287921
  
    **[Test build #49049 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49049/consoleFull)** for PR 10677 at commit [`0e8317d`](https://github.com/apache/spark/commit/0e8317db91e081a30beb22adfb8c3463adcc51ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182601016
  
    LGTM
    
    I left a few minor final comments in CatalystQl


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-170298109
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182231248
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-170298048
  
    **[Test build #49050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49050/consoleFull)** for PR 10677 at commit [`736e8d2`](https://github.com/apache/spark/commit/736e8d2d0c20c2e5eb089dfc011c40f20f7c90e6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class BitwiseReverse(child: Expression, width: Int)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-170291988
  
    **[Test build #49050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49050/consoleFull)** for PR 10677 at commit [`736e8d2`](https://github.com/apache/spark/commit/736e8d2d0c20c2e5eb089dfc011c40f20f7c90e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182231251
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51025/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182589013
  
    **[Test build #51053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51053/consoleFull)** for PR 10677 at commit [`3469e45`](https://github.com/apache/spark/commit/3469e4511b6de6f000459395297c220960d37bd8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182252532
  
    **[Test build #51027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51027/consoleFull)** for PR 10677 at commit [`c008569`](https://github.com/apache/spark/commit/c008569bdc28b5e5bdb4149b7fb557807132bb76).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182630815
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182600162
  
    **[Test build #51043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51043/consoleFull)** for PR 10677 at commit [`aa34559`](https://github.com/apache/spark/commit/aa345595202afdd451d15269d708e37838758243).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182632128
  
    **[Test build #2534 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2534/consoleFull)** for PR 10677 at commit [`9c7a06f`](https://github.com/apache/spark/commit/9c7a06f0902df28e243f2f0d6160f2356f416ca9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-176628618
  
    @hvanhovell can you review this one?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182630817
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51053/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r51236354
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -233,14 +233,39 @@ class Analyzer(
               }
             }.toMap
     
    -        val aggregations: Seq[NamedExpression] = x.aggregations.map {
    -          // If an expression is an aggregate (contains a AggregateExpression) then we dont change
    -          // it so that the aggregation is computed on the unmodified value of its argument
    -          // expressions.
    -          case expr if expr.find(_.isInstanceOf[AggregateExpression]).nonEmpty => expr
    -          // If not then its a grouping expression and we need to use the modified (with nulls from
    -          // Expand) value of the expression.
    -          case expr => expr.transformDown {
    +        val aggregations: Seq[NamedExpression] = x.aggregations.map { case expr =>
    +          // collect all the found AggregateExpression, so we can check an expression is part of
    +          // any AggregateExpression or not.
    +          val aggsBuffer = ArrayBuffer[Expression]()
    +          def isPartOfAggregation(e: Expression): Boolean = {
    +            aggsBuffer.exists(a => a.find(_ eq e).isDefined)
    +          }
    +          expr.transformDown {
    +            // AggregateExpression should be computed on the unmodified value of its argument
    +            // expressions, so we should not replace any references to grouping expression
    +            // inside it.
    +            case e: AggregateExpression =>
    +              aggsBuffer += e
    +              e
    +            case e if isPartOfAggregation(e) => e
    +            case e: GroupingID =>
    --- End diff --
    
    This is probably a dumb question. What happens if we use these functions without grouping sets? Do we get a nice analysis exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182229748
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r51645019
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -233,14 +233,39 @@ class Analyzer(
               }
             }.toMap
     
    -        val aggregations: Seq[NamedExpression] = x.aggregations.map {
    -          // If an expression is an aggregate (contains a AggregateExpression) then we dont change
    -          // it so that the aggregation is computed on the unmodified value of its argument
    -          // expressions.
    -          case expr if expr.find(_.isInstanceOf[AggregateExpression]).nonEmpty => expr
    -          // If not then its a grouping expression and we need to use the modified (with nulls from
    -          // Expand) value of the expression.
    -          case expr => expr.transformDown {
    +        val aggregations: Seq[NamedExpression] = x.aggregations.map { case expr =>
    +          // collect all the found AggregateExpression, so we can check an expression is part of
    +          // any AggregateExpression or not.
    +          val aggsBuffer = ArrayBuffer[Expression]()
    +          def isPartOfAggregation(e: Expression): Boolean = {
    +            aggsBuffer.exists(a => a.find(_ eq e).isDefined)
    +          }
    +          expr.transformDown {
    +            // AggregateExpression should be computed on the unmodified value of its argument
    +            // expressions, so we should not replace any references to grouping expression
    +            // inside it.
    +            case e: AggregateExpression =>
    +              aggsBuffer += e
    +              e
    +            case e if isPartOfAggregation(e) => e
    +            case e: GroupingID =>
    +              if (e.groupByExprs == x.groupByExprs) {
    +                // the bitmask is following Hive, which is wrong, we need to reverse it here
    +                // TODO: don't not follow Hive
    +                BitwiseReverse(BitwiseNot(gid), e.groupByExprs.length)
    --- End diff --
    
    Why not break compatibility here - we are anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182231159
  
    **[Test build #51025 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51025/consoleFull)** for PR 10677 at commit [`9511c2c`](https://github.com/apache/spark/commit/9511c2ca96366f9ac3f3d77e260ca3abd3b9f38c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r51644362
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -233,14 +233,39 @@ class Analyzer(
               }
             }.toMap
     
    -        val aggregations: Seq[NamedExpression] = x.aggregations.map {
    -          // If an expression is an aggregate (contains a AggregateExpression) then we dont change
    -          // it so that the aggregation is computed on the unmodified value of its argument
    -          // expressions.
    -          case expr if expr.find(_.isInstanceOf[AggregateExpression]).nonEmpty => expr
    -          // If not then its a grouping expression and we need to use the modified (with nulls from
    -          // Expand) value of the expression.
    -          case expr => expr.transformDown {
    +        val aggregations: Seq[NamedExpression] = x.aggregations.map { case expr =>
    +          // collect all the found AggregateExpression, so we can check an expression is part of
    +          // any AggregateExpression or not.
    +          val aggsBuffer = ArrayBuffer[Expression]()
    +          def isPartOfAggregation(e: Expression): Boolean = {
    +            aggsBuffer.exists(a => a.find(_ eq e).isDefined)
    +          }
    +          expr.transformDown {
    +            // AggregateExpression should be computed on the unmodified value of its argument
    +            // expressions, so we should not replace any references to grouping expression
    +            // inside it.
    +            case e: AggregateExpression =>
    +              aggsBuffer += e
    +              e
    +            case e if isPartOfAggregation(e) => e
    +            case e: GroupingID =>
    --- End diff --
    
    Right now, it will fail to resolve, agreed that should be have a better error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r51645142
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -233,14 +233,39 @@ class Analyzer(
               }
             }.toMap
     
    -        val aggregations: Seq[NamedExpression] = x.aggregations.map {
    -          // If an expression is an aggregate (contains a AggregateExpression) then we dont change
    -          // it so that the aggregation is computed on the unmodified value of its argument
    -          // expressions.
    -          case expr if expr.find(_.isInstanceOf[AggregateExpression]).nonEmpty => expr
    -          // If not then its a grouping expression and we need to use the modified (with nulls from
    -          // Expand) value of the expression.
    -          case expr => expr.transformDown {
    +        val aggregations: Seq[NamedExpression] = x.aggregations.map { case expr =>
    +          // collect all the found AggregateExpression, so we can check an expression is part of
    +          // any AggregateExpression or not.
    +          val aggsBuffer = ArrayBuffer[Expression]()
    +          def isPartOfAggregation(e: Expression): Boolean = {
    +            aggsBuffer.exists(a => a.find(_ eq e).isDefined)
    --- End diff --
    
    I had to take a long hard look to understand this. You are comparing references not structure. Maybe a small piece of documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182154207
  
    **[Test build #51017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51017/consoleFull)** for PR 10677 at commit [`90c1655`](https://github.com/apache/spark/commit/90c16552d32836e11c988843748c366e615038e1).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182529343
  
    **[Test build #51043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51043/consoleFull)** for PR 10677 at commit [`aa34559`](https://github.com/apache/spark/commit/aa345595202afdd451d15269d708e37838758243).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182154211
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51017/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182213713
  
    **[Test build #51025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51025/consoleFull)** for PR 10677 at commit [`9511c2c`](https://github.com/apache/spark/commit/9511c2ca96366f9ac3f3d77e260ca3abd3b9f38c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-170298110
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49050/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182600579
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51043/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182630549
  
    **[Test build #51053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51053/consoleFull)** for PR 10677 at commit [`3469e45`](https://github.com/apache/spark/commit/3469e4511b6de6f000459395297c220960d37bd8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182153902
  
    **[Test build #51017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51017/consoleFull)** for PR 10677 at commit [`90c1655`](https://github.com/apache/spark/commit/90c16552d32836e11c988843748c366e615038e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182229751
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51026/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r51644299
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitwiseExpressions.scala ---
    @@ -124,3 +124,47 @@ case class BitwiseNot(child: Expression) extends UnaryExpression with ExpectsInp
     
       protected override def nullSafeEval(input: Any): Any = not(input)
     }
    +
    +/**
    +  * A function that reverse the lowest N bits of a integer.
    +  *
    +  * Note: this is only used for grouping_id()
    +  */
    +case class BitwiseReverse(child: Expression, width: Int)
    +  extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType)
    +
    +  override def dataType: DataType = IntegerType
    +
    +  override def toString: String = s"^$child"
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val v = ctx.freshName("v")
    +      val i = ctx.freshName("i")
    +      s"""
    +         | int $v = $c;
    +         | ${ev.value} = 0;
    +         | for (int $i = 0; $i < $width; $i ++) {
    +         |   ${ev.value} <<= 1;
    +         |   ${ev.value} |= $v & 1;
    +         |   $v >>>= 1;
    +         | }
    +       """.stripMargin
    +    })
    +  }
    +
    +  protected override def nullSafeEval(input: Any): Any = {
    +    var v = input.asInstanceOf[Int]
    --- End diff --
    
    Great, will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-178869267
  
    @davies the PR is in good shape. The two main (minor) issues I could find are:
    - The use of the Hive gid which is wrong. We could also break some compatibility and solve it at the root.
    - The usefulllnes of child expressions in the ```grouping_id``` function. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-170287980
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182528676
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182548652
  
    @davies Yeah, you have a point there.
    
    We have inherited the wrong construction of the bitmask from Hive: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystQl.scala#L202-L206. We could also fix/document it there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by aray <gi...@git.apache.org>.

Github user aray commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r50137528
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -324,6 +324,51 @@ object functions extends LegacyFunctions {
        */
       def first(columnName: String): Column = first(Column(columnName))
     
    +
    +  /**
    +    * Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
    +    * or not, returns 1 for aggregated or 0 for not aggregated in the result set.
    +    *
    +    * @group agg_funcs
    +    * @since 2.0.0
    +    */
    +  def grouping(e: Column): Column = Column(Grouping(e.expr))
    +
    +  /**
    +    * Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
    +    * or not, returns 1 for aggregated or 0 for not aggregated in the result set.
    +    *
    +    * @group agg_funcs
    +    * @since 2.0.0
    +    */
    +  def grouping(columnName: String): Column = grouping(Column(columnName))
    +
    +  /**
    +    * Aggregate function: returns the level of grouping, equals to
    +    *
    +    *   (grouping(c1) << (n-1)) + (grouping(c1) << (n-2)) + ... + grouping(cn)
    --- End diff --
    
    Second term should be grouping(c**2**)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-176640840
  
    I'll have a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182528681
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51042/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182609500
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r51647326
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -324,6 +324,51 @@ object functions extends LegacyFunctions {
        */
       def first(columnName: String): Column = first(Column(columnName))
     
    +
    +  /**
    +    * Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
    +    * or not, returns 1 for aggregated or 0 for not aggregated in the result set.
    +    *
    +    * @group agg_funcs
    +    * @since 2.0.0
    +    */
    +  def grouping(e: Column): Column = Column(Grouping(e.expr))
    +
    +  /**
    +    * Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
    +    * or not, returns 1 for aggregated or 0 for not aggregated in the result set.
    +    *
    +    * @group agg_funcs
    +    * @since 2.0.0
    +    */
    +  def grouping(columnName: String): Column = grouping(Column(columnName))
    +
    +  /**
    +    * Aggregate function: returns the level of grouping, equals to
    +    *
    +    *   (grouping(c1) << (n-1)) + (grouping(c1) << (n-2)) + ... + grouping(cn)
    +    *
    +    * Note: the list of columns should match with grouping columns exactly.
    --- End diff --
    
    So why have people pass a list of columns at all?
    
    We could at least remove this for the DataFrame API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-172651882
  
    @nongli @rxin  Could you have to review this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-170287977
  
    **[Test build #49049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49049/consoleFull)** for PR 10677 at commit [`0e8317d`](https://github.com/apache/spark/commit/0e8317db91e081a30beb22adfb8c3463adcc51ec).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class BitwiseReverse(child: Expression, width: Int)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182663855
  
    **[Test build #2534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2534/consoleFull)** for PR 10677 at commit [`9c7a06f`](https://github.com/apache/spark/commit/9c7a06f0902df28e243f2f0d6160f2356f416ca9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182252702
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51027/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182230687
  
    @davies I did some reading on ```grouping_id()``` and it depends per vendor. Oracle also allows you to use a subset of the grouping columns, see: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions063.htm. Lets keep it like it is, someone can always implement this more fancy operator (shouldn't be to hard).
    
    As for the Hive compatibility we can also add a few lines of comments in the Analyzer explaining why Hive is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10677


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r52401092
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -324,6 +324,51 @@ object functions extends LegacyFunctions {
        */
       def first(columnName: String): Column = first(Column(columnName))
     
    +
    +  /**
    +    * Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
    +    * or not, returns 1 for aggregated or 0 for not aggregated in the result set.
    +    *
    +    * @group agg_funcs
    +    * @since 2.0.0
    +    */
    +  def grouping(e: Column): Column = Column(Grouping(e.expr))
    +
    +  /**
    +    * Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
    +    * or not, returns 1 for aggregated or 0 for not aggregated in the result set.
    +    *
    +    * @group agg_funcs
    +    * @since 2.0.0
    +    */
    +  def grouping(columnName: String): Column = grouping(Column(columnName))
    +
    +  /**
    +    * Aggregate function: returns the level of grouping, equals to
    +    *
    +    *   (grouping(c1) << (n-1)) + (grouping(c1) << (n-2)) + ... + grouping(cn)
    +    *
    +    * Note: the list of columns should match with grouping columns exactly.
    --- End diff --
    
    Agreed that this is useless, this is to match the API in other databases (porgresql, sql server, oracle)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r51643536
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitwiseExpressions.scala ---
    @@ -124,3 +124,47 @@ case class BitwiseNot(child: Expression) extends UnaryExpression with ExpectsInp
     
       protected override def nullSafeEval(input: Any): Any = not(input)
     }
    +
    +/**
    +  * A function that reverse the lowest N bits of a integer.
    +  *
    +  * Note: this is only used for grouping_id()
    +  */
    +case class BitwiseReverse(child: Expression, width: Int)
    +  extends UnaryExpression with ExpectsInputTypes {
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(IntegerType)
    +
    +  override def dataType: DataType = IntegerType
    +
    +  override def toString: String = s"^$child"
    +
    +  override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
    +    nullSafeCodeGen(ctx, ev, c => {
    +      val v = ctx.freshName("v")
    +      val i = ctx.freshName("i")
    +      s"""
    +         | int $v = $c;
    +         | ${ev.value} = 0;
    +         | for (int $i = 0; $i < $width; $i ++) {
    +         |   ${ev.value} <<= 1;
    +         |   ${ev.value} |= $v & 1;
    +         |   $v >>>= 1;
    +         | }
    +       """.stripMargin
    +    })
    +  }
    +
    +  protected override def nullSafeEval(input: Any): Any = {
    +    var v = input.asInstanceOf[Int]
    --- End diff --
    
    Nit: We could also use ```Integer.reverse``` and a mask here. It saves some code and a branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r52419520
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -233,14 +233,39 @@ class Analyzer(
               }
             }.toMap
     
    -        val aggregations: Seq[NamedExpression] = x.aggregations.map {
    -          // If an expression is an aggregate (contains a AggregateExpression) then we dont change
    -          // it so that the aggregation is computed on the unmodified value of its argument
    -          // expressions.
    -          case expr if expr.find(_.isInstanceOf[AggregateExpression]).nonEmpty => expr
    -          // If not then its a grouping expression and we need to use the modified (with nulls from
    -          // Expand) value of the expression.
    -          case expr => expr.transformDown {
    +        val aggregations: Seq[NamedExpression] = x.aggregations.map { case expr =>
    +          // collect all the found AggregateExpression, so we can check an expression is part of
    +          // any AggregateExpression or not.
    +          val aggsBuffer = ArrayBuffer[Expression]()
    +          def isPartOfAggregation(e: Expression): Boolean = {
    +            aggsBuffer.exists(a => a.find(_ eq e).isDefined)
    +          }
    +          expr.transformDown {
    +            // AggregateExpression should be computed on the unmodified value of its argument
    +            // expressions, so we should not replace any references to grouping expression
    +            // inside it.
    +            case e: AggregateExpression =>
    +              aggsBuffer += e
    +              e
    +            case e if isPartOfAggregation(e) => e
    +            case e: GroupingID =>
    --- End diff --
    
    Had capture this in CheckAnasys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182609503
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51061/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182694469
  
    merging into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10677#issuecomment-182154210
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12706] [SQL] grouping() and grouping_id...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10677#discussion_r52401165
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -233,14 +233,39 @@ class Analyzer(
               }
             }.toMap
     
    -        val aggregations: Seq[NamedExpression] = x.aggregations.map {
    -          // If an expression is an aggregate (contains a AggregateExpression) then we dont change
    -          // it so that the aggregation is computed on the unmodified value of its argument
    -          // expressions.
    -          case expr if expr.find(_.isInstanceOf[AggregateExpression]).nonEmpty => expr
    -          // If not then its a grouping expression and we need to use the modified (with nulls from
    -          // Expand) value of the expression.
    -          case expr => expr.transformDown {
    +        val aggregations: Seq[NamedExpression] = x.aggregations.map { case expr =>
    +          // collect all the found AggregateExpression, so we can check an expression is part of
    +          // any AggregateExpression or not.
    +          val aggsBuffer = ArrayBuffer[Expression]()
    +          def isPartOfAggregation(e: Expression): Boolean = {
    +            aggsBuffer.exists(a => a.find(_ eq e).isDefined)
    +          }
    +          expr.transformDown {
    +            // AggregateExpression should be computed on the unmodified value of its argument
    +            // expressions, so we should not replace any references to grouping expression
    +            // inside it.
    +            case e: AggregateExpression =>
    +              aggsBuffer += e
    +              e
    +            case e if isPartOfAggregation(e) => e
    +            case e: GroupingID =>
    +              if (e.groupByExprs == x.groupByExprs) {
    +                // the bitmask is following Hive, which is wrong, we need to reverse it here
    +                // TODO: don't not follow Hive
    +                BitwiseReverse(BitwiseNot(gid), e.groupByExprs.length)
    --- End diff --
    
    Try to re-use the Hive tests, will pull them out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org