You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/02/20 16:35:58 UTC

[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/11283

    [SPARK-12720] [SQL] SQL Generation Support for Cube, Rollup, and Grouping Sets

    ## What changes were proposed in this pull request?
    
    This PR is for supporting SQL generation for cube, rollup and grouping sets.
    
    For example, a query using rollup:
    ```SQL
    SELECT count(*) as cnt, key % 5, grouping_id() FROM t1 GROUP BY key % 5 WITH ROLLUP
    ```
    Original logical plan:
    ```
      Aggregate [(key#17L % cast(5 as bigint))#47L,grouping__id#46],
                [(count(1),mode=Complete,isDistinct=false) AS cnt#43L,
                 (key#17L % cast(5 as bigint))#47L AS _c1#45L,
                 grouping__id#46 AS _c2#44]
      +- Expand [List(key#17L, value#18, (key#17L % cast(5 as bigint))#47L, 0),
                 List(key#17L, value#18, null, 1)],
                [key#17L,value#18,(key#17L % cast(5 as bigint))#47L,grouping__id#46]
         +- Project [key#17L,
                     value#18,
                     (key#17L % cast(5 as bigint)) AS (key#17L % cast(5 as bigint))#47L]
            +- Subquery t1
               +- Relation[key#17L,value#18] ParquetRelation
    ```
    Converted SQL:
    ```SQL
      SELECT count( 1) AS `cnt`,
             (`t1`.`key` % CAST(5 AS BIGINT)),
             grouping_id() AS `_c2`
      FROM `default`.`t1`
      GROUP BY (`t1`.`key` % CAST(5 AS BIGINT))
      GROUPING SETS (((`t1`.`key` % CAST(5 AS BIGINT))), ())
    ```
    
    ## How was the this patch tested?
    
    Added six test cases in LogicalPlanToSQLSuite.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark groupingSetsToSQL

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11283
    
----
commit bc0c0309b1ec0d5303b744025032070f00c2bc9c
Author: gatorsmile <ga...@gmail.com>
Date:   2016-02-20T15:27:52Z

    SQL generation support for cube, rollup, and grouping set

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191945451
  
    **[Test build #52401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52401/consoleFull)** for PR 11283 at commit [`b1925e5`](https://github.com/apache/spark/commit/b1925e5feb83228db8ed8502306368a2d979e56b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191595313
  
    Thank you! @cloud-fan  Will try to follow this in the future PR for `toSQL`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192610150
  
    **[Test build #52508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52508/consoleFull)** for PR 11283 at commit [`385c0d9`](https://github.com/apache/spark/commit/385c0d9c0acce936b8f567613f747ccc7fcdcaff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192107847
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191572125
  
    **[Test build #52364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52364/consoleFull)** for PR 11283 at commit [`37a9d8d`](https://github.com/apache/spark/commit/37a9d8d4c86d1207b07bef4186fc23ad5bf3d75c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55117081
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections is composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.drop(numOriginalOutput).dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = agg.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar == gid => GroupingID(Nil)
    +        case ar: AttributeReference if groupByAttrMap.contains(ar) => groupByAttrMap(ar)
    +        case a @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupByExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupByExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    +      }
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (agg.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    --- End diff --
    
    Added another two test cases for this. Now, we have three ```rollup/cube #5``` ```rollup/cube #8"``` ```rollup/cube #9``` test cases for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191081390
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54985730
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -107,6 +107,11 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         case p: Project =>
           projectToSQL(p, isDistinct = false)
     
    +    case a @ Aggregate(_, _, e @ Expand(_, _, p: Project))
    +      if sameOutput(e.output,
    +        p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute])) =>
    --- End diff --
    
    Thank you for pointing it out! I will be more careful next time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192610211
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52508/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191640140
  
    **[Test build #52374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52374/consoleFull)** for PR 11283 at commit [`ae768fe`](https://github.com/apache/spark/commit/ae768fe7a27262e58913e50b654bfc9f23199b12).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192624171
  
    This LGTM now. Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191943867
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52400/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55016424
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections is composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.drop(numOriginalOutput).dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = agg.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar == gid => GroupingID(Nil)
    +        case ar: AttributeReference if groupByAttrMap.contains(ar) => groupByAttrMap(ar)
    +        case a @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    --- End diff --
    
    Nit: Why the `_ @`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54716705
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    --- End diff --
    
    The last attribute of each project list is always gid, we can call `dropRight(1)` before `filter`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54769010
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        Alias(GroupingID(Nil), name)()
    +      case a @ Alias(_ @ Cast(BitwiseAnd(
    +        ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +        Literal(1, IntegerType)), ByteType), name)
    +          if ar.name == VirtualColumn.groupingIdName =>
    +        // for converting an expression to its original SQL format grouping(col)
    +        val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +        val groupingCol = groupingExprs.lift(idx)
    +        if (groupingCol.isDefined) {
    +          Grouping(groupingCol.get)
    +        } else {
    +          throw new UnsupportedOperationException(s"unsupported operator $a")
    +        }
    +      case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +        aliasMap(child).child
    +      case o => o
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (plan.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    +      if (groupingSQL.isEmpty) "" else "GROUP BY",
    --- End diff --
    
    I did a few tries. The parser enforces it. It has to be non-empty. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191125506
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52298/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54821563
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        Alias(GroupingID(Nil), name)()
    +      case a @ Alias(_ @ Cast(BitwiseAnd(
    +        ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +        Literal(1, IntegerType)), ByteType), name)
    +          if ar.name == VirtualColumn.groupingIdName =>
    +        // for converting an expression to its original SQL format grouping(col)
    +        val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +        val groupingCol = groupingExprs.lift(idx)
    +        if (groupingCol.isDefined) {
    +          Grouping(groupingCol.get)
    +        } else {
    +          throw new UnsupportedOperationException(s"unsupported operator $a")
    +        }
    +      case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +        aliasMap(child).child
    +      case o => o
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (plan.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    +      if (groupingSQL.isEmpty) "" else "GROUP BY",
    --- End diff --
    
    seems this check here is unnecessary, how about we do an assert at the beginning and don't check here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191946246
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52401/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54978903
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,67 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    --- End diff --
    
    Will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54768886
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    --- End diff --
    
    Yeah, we can use this way. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54978777
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,67 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections are composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar eq gid => GroupingID(Nil)
    +        case a @ Alias(_ @ Cast(BitwiseAnd(
    --- End diff --
    
    can we also remove this `Alias` and the `Alias` in next case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192055095
  
    **[Test build #52433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52433/consoleFull)** for PR 11283 at commit [`6f609fb`](https://github.com/apache/spark/commit/6f609fb2d844e2aaf4c809ef8c0fcd9e6eca38bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54836420
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)))
    +    val groupingExprs = groupByAttrMap.values.toArray
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    --- End diff --
    
    We still need this for getting the alias `name`. Yeah, another Alias below is useless. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191643198
  
    **[Test build #52373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52373/consoleFull)** for PR 11283 at commit [`749be1b`](https://github.com/apache/spark/commit/749be1b2de07603a7fc9961872954c24a584a1fb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192600900
  
    @liancheng I ran the suite in my local laptop. All the related tests works. However, I hit multiple regression failure that was introduced by another PR: https://github.com/apache/spark/pull/11466
    
    I will submit a separate PR to fix the issues. Thank you for your reviews!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187689367
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54844203
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,65 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      project.dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    --- End diff --
    
    seems we get the alias name and use it to build a new `Alias` only? `transformDown` will do this trick for us I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55016422
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    --- End diff --
    
    Nit: Would be nice to add type annotation to `groupingSet` since it's a relatively complex nested data structure and can be hard to grasp.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54713893
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    --- End diff --
    
    Should we use `groupingExprs.exists(...)` instead of `plan.groupingExpressions`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191595194
  
    **[Test build #52364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52364/consoleFull)** for PR 11283 at commit [`37a9d8d`](https://github.com/apache/spark/commit/37a9d8d4c86d1207b07bef4186fc23ad5bf3d75c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191641259
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52374/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191595584
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52364/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187399651
  
    **[Test build #51667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51667/consoleFull)** for PR 11283 at commit [`a408325`](https://github.com/apache/spark/commit/a4083251381a033960f76b72b477adabac024faf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187441402
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51667/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-186657709
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191594243
  
    **[Test build #52373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52373/consoleFull)** for PR 11283 at commit [`749be1b`](https://github.com/apache/spark/commit/749be1b2de07603a7fc9961872954c24a584a1fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191597047
  
    **[Test build #52374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52374/consoleFull)** for PR 11283 at commit [`ae768fe`](https://github.com/apache/spark/commit/ae768fe7a27262e58913e50b654bfc9f23199b12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192610210
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192090337
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187441126
  
    **[Test build #51667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51667/consoleFull)** for PR 11283 at commit [`a408325`](https://github.com/apache/spark/commit/a4083251381a033960f76b72b477adabac024faf).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192090338
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52433/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54838300
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)))
    +    val groupingExprs = groupByAttrMap.values.toArray
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    --- End diff --
    
    This will generate Option. A little change I made:
    ```
        val groupingSet = expand.projections.map { project =>
          project.dropRight(1).collect {
            case attr: Attribute if groupingAttrSet.contains(attr) => groupByAttrMap(attr)
          }
        }
    ```
    Thank you! Let me know if you do not like this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54838889
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,76 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingExprs = groupByAttrMap.values.toArray
    --- End diff --
    
    isn't `groupingExprs` logically equal to `groupByExprs`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54829310
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,76 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: AttributeReference => a == gid
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    +          // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +          Alias(GroupingID(Nil), name)()
    +        case a @ Alias(_ @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType), name) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupingExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    +        case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +          aliasMap(child).child
    +        case o => o
    +      }
    +    }
    +
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    --- End diff --
    
    Let me add these test cases. So far,  everything works fine. 
    
    **Original SQL: **```SELECT a, b, sum(c) FROM parquet_t2 GROUP BY a, b GROUPING SETS (()) ORDER BY a, b```
    =>
    **Generated SQL: **```SELECT `a`, `b`, sum(`parquet_t2`.`c`) AS `sum(c)` FROM `default`.`parquet_t2` GROUP BY `a`, `b` GROUPING SETS(()) ORDER BY `a` ASC, `b` ASC```
    
    **Original SQL: **```SELECT a, b, sum(c) FROM parquet_t2 GROUP BY a, b GROUPING SETS ((), (a), (a, b)) ORDER BY a, b```
    =>
    **Generated SQL:** ```SELECT `a`, `b`, sum(`parquet_t2`.`c`) AS `sum(c)` FROM `default`.`parquet_t2` GROUP BY `a`, `b` GROUPING SETS((), (`a`), (`a`, `b`)) ORDER BY `a` ASC, `b` ASC```
    
    The only issue is ```GROUPING SETS()```, however, it is rejected by our Parser. I think this is not legal. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55116231
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections is composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.drop(numOriginalOutput).dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = agg.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar == gid => GroupingID(Nil)
    +        case ar: AttributeReference if groupByAttrMap.contains(ar) => groupByAttrMap(ar)
    +        case a @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    --- End diff --
    
    removed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191575891
  
    **[Test build #52365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52365/consoleFull)** for PR 11283 at commit [`640e45c`](https://github.com/apache/spark/commit/640e45cdbbe0d08acb26af715181dc4d16dd7893).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191125185
  
    **[Test build #52298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52298/consoleFull)** for PR 11283 at commit [`1aef849`](https://github.com/apache/spark/commit/1aef8494f9e131a584db5a93d9b20f78b5abe44c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191643564
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191600191
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54714370
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    --- End diff --
    
    instead of checking the name, how about we get the `gid` first, i.e. `val gid = expand.output.last`, then we can simplify this case: `case a @ Alias(child, name) if child eq gid =>`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-186659788
  
    cc @liancheng Could you take a look at this? : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191891865
  
    **[Test build #52400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52400/consoleFull)** for PR 11283 at commit [`5d8923c`](https://github.com/apache/spark/commit/5d8923ceb94ae5ae5b2319fc897341db1a574c2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192615328
  
    Thanks for running the test. Could you please name these regressions? I couldn't find them. Seems that currently Jenkins PR builder looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191943864
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54839093
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,76 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingExprs = groupByAttrMap.values.toArray
    --- End diff --
    
    : ) nvm. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54887559
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,65 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      project.dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    --- End diff --
    
    uh, I see. Will remove it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54982167
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,72 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the group by attributes' aliases
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliases of the original group by attributes, which could be expressions
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from the alias name to the original group by expresions/attributes
    --- End diff --
    
    update this comment: `a map from group by attributes to the original group by expressions.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-186657710
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51597/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-186631459
  
    **[Test build #51597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51597/consoleFull)** for PR 11283 at commit [`bc0c030`](https://github.com/apache/spark/commit/bc0c0309b1ec0d5303b744025032070f00c2bc9c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54842401
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -85,6 +86,9 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         case p: Project =>
           projectToSQL(p, isDistinct = false)
     
    +    case a @ Aggregate(_, _, e @ Expand(_, _, p: Project)) =>
    --- End diff --
    
    we can add a safe guard here, only handle it as grouping set when `p.output == p.child.output ++ a.groupingExpressions`. (`==` is wrong here as nullability may differ, we can add a `sameOutput` method that use `semanticEquals`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-186657625
  
    **[Test build #51597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51597/consoleFull)** for PR 11283 at commit [`bc0c030`](https://github.com/apache/spark/commit/bc0c0309b1ec0d5303b744025032070f00c2bc9c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54769041
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        Alias(GroupingID(Nil), name)()
    +      case a @ Alias(_ @ Cast(BitwiseAnd(
    +        ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +        Literal(1, IntegerType)), ByteType), name)
    +          if ar.name == VirtualColumn.groupingIdName =>
    +        // for converting an expression to its original SQL format grouping(col)
    +        val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +        val groupingCol = groupingExprs.lift(idx)
    +        if (groupingCol.isDefined) {
    +          Grouping(groupingCol.get)
    +        } else {
    +          throw new UnsupportedOperationException(s"unsupported operator $a")
    +        }
    +      case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +        aliasMap(child).child
    +      case o => o
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (plan.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    +      if (groupingSQL.isEmpty) "" else "GROUP BY",
    +      groupingSQL,
    +      "GROUPING SETS",
    +      "(" + groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    --- End diff --
    
    Will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191600194
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52365/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187688699
  
    **[Test build #51754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51754/consoleFull)** for PR 11283 at commit [`1aef849`](https://github.com/apache/spark/commit/1aef8494f9e131a584db5a93d9b20f78b5abe44c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54834172
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)))
    --- End diff --
    
    nit: we can get the groupByAlias first, i.e.
    ```
    val groupByExprs = project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    ```
    
    seems cleaner


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54981806
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -107,6 +107,11 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         case p: Project =>
           projectToSQL(p, isDistinct = false)
     
    +    case a @ Aggregate(_, _, e @ Expand(_, _, p: Project))
    +      if sameOutput(e.output,
    +        p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute])) =>
    --- End diff --
    
    Sorry misses this: should check `isInstanceOf` before calling `asInstanceOf ` directly.
    We can put all of it in one method and use it as if condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192662222
  
    Yeah, definitely, it helps a lot. Otherwise, toSQL needs to identify the pattern and convert it back using a few assumptions. The pattern and assumptions we made depends on the implementation of our analyzer rules.
    
    Before we finalize the design, I will first stop working on the SQL generation of `multi-distinct aggregation`. Thanks! @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191421765
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52331/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54842920
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -85,6 +86,9 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         case p: Project =>
           projectToSQL(p, isDistinct = false)
     
    +    case a @ Aggregate(_, _, e @ Expand(_, _, p: Project)) =>
    --- End diff --
    
    sure, will do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54842947
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,65 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      project.dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    --- End diff --
    
    do we still need to match `Alias` under `transformDown`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191946239
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54821660
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,76 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: AttributeReference => a == gid
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    +          // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +          Alias(GroupingID(Nil), name)()
    +        case a @ Alias(_ @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType), name) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupingExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    +        case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +          aliasMap(child).child
    +        case o => o
    +      }
    +    }
    +
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    --- End diff --
    
    if a grouping set is empty, should we generate `()` for it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54843075
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,65 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      project.dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    --- End diff --
    
    Yeah, for getting the alias name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55016431
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections is composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.drop(numOriginalOutput).dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = agg.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar == gid => GroupingID(Nil)
    +        case ar: AttributeReference if groupByAttrMap.contains(ar) => groupByAttrMap(ar)
    +        case a @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupByExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupByExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    +      }
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (agg.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    --- End diff --
    
    Let's add some test cases where `project.child` itself is a more complicated plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54829420
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        Alias(GroupingID(Nil), name)()
    +      case a @ Alias(_ @ Cast(BitwiseAnd(
    +        ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +        Literal(1, IntegerType)), ByteType), name)
    +          if ar.name == VirtualColumn.groupingIdName =>
    +        // for converting an expression to its original SQL format grouping(col)
    +        val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +        val groupingCol = groupingExprs.lift(idx)
    +        if (groupingCol.isDefined) {
    +          Grouping(groupingCol.get)
    +        } else {
    +          throw new UnsupportedOperationException(s"unsupported operator $a")
    +        }
    +      case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +        aliasMap(child).child
    +      case o => o
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (plan.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    +      if (groupingSQL.isEmpty) "" else "GROUP BY",
    --- End diff --
    
    True. Let me remove it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192090048
  
    **[Test build #52433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52433/consoleFull)** for PR 11283 at commit [`6f609fb`](https://github.com/apache/spark/commit/6f609fb2d844e2aaf4c809ef8c0fcd9e6eca38bb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187611272
  
    **[Test build #51754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51754/consoleFull)** for PR 11283 at commit [`1aef849`](https://github.com/apache/spark/commit/1aef8494f9e131a584db5a93d9b20f78b5abe44c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54715165
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        Alias(GroupingID(Nil), name)()
    +      case a @ Alias(_ @ Cast(BitwiseAnd(
    +        ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +        Literal(1, IntegerType)), ByteType), name)
    +          if ar.name == VirtualColumn.groupingIdName =>
    +        // for converting an expression to its original SQL format grouping(col)
    +        val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +        val groupingCol = groupingExprs.lift(idx)
    +        if (groupingCol.isDefined) {
    +          Grouping(groupingCol.get)
    +        } else {
    +          throw new UnsupportedOperationException(s"unsupported operator $a")
    +        }
    +      case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +        aliasMap(child).child
    +      case o => o
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (plan.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    +      if (groupingSQL.isEmpty) "" else "GROUP BY",
    +      groupingSQL,
    +      "GROUPING SETS",
    +      "(" + groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    --- End diff --
    
    this should be together with `"GROUPING SETS"`, or there will be a space between them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54978894
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -203,6 +208,10 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
           throw new UnsupportedOperationException(s"unsupported plan $node")
       }
     
    +  private def sameOutput(left: Seq[Attribute], right: Seq[Attribute]): Boolean =
    --- End diff --
    
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54833646
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    --- End diff --
    
    uh, you are right. Let me change it. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55119845
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections is composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.drop(numOriginalOutput).dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = agg.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar == gid => GroupingID(Nil)
    +        case ar: AttributeReference if groupByAttrMap.contains(ar) => groupByAttrMap(ar)
    +        case a @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupByExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupByExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    --- End diff --
    
    Makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191643566
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52373/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54713624
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    --- End diff --
    
    Can the above be simplified to:
    ```
    val groupByAliasMap = AttributeMap(project.projectList.drop(project.child.output.length).map {
      case a @ Alias(child, _) => (a.toAttribute, child)
    }) // the `project` is `Project(x.child.output ++ groupByAliases, x.child)`
    val groupingExprs = groupByAliasMap.values.toArray
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191081621
  
    **[Test build #52298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52298/consoleFull)** for PR 11283 at commit [`1aef849`](https://github.com/apache/spark/commit/1aef8494f9e131a584db5a93d9b20f78b5abe44c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191578269
  
    overall LGTM, but we can leverage more information to make the code simpler. For example, `Aggregate.groupingExpressions` are guaranteed to be `Attribute`s, each projection list in `expand.projections` are compose by `original output` + `group by attributes` + `gid`, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192107596
  
    **[Test build #52440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52440/consoleFull)** for PR 11283 at commit [`9eaca51`](https://github.com/apache/spark/commit/9eaca515a3a86f07ed4ca85ba6da080ad605d1c0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192049746
  
    LGTM except some minor comments, thanks for working on it!
    
    cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55116232
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    --- End diff --
    
    Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54836870
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)))
    +    val groupingExprs = groupByAttrMap.values.toArray
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    +          // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +          Alias(GroupingID(Nil), name)()
    +        case a @ Alias(_ @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType), name) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupingExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    +        case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    --- End diff --
    
    True, let me remove it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192600989
  
    **[Test build #52508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52508/consoleFull)** for PR 11283 at commit [`385c0d9`](https://github.com/apache/spark/commit/385c0d9c0acce936b8f567613f747ccc7fcdcaff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54979458
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,67 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections are composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar eq gid => GroupingID(Nil)
    +        case a @ Alias(_ @ Cast(BitwiseAnd(
    --- End diff --
    
    Ok, let me remove it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55016954
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections is composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.drop(numOriginalOutput).dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = agg.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar == gid => GroupingID(Nil)
    +        case ar: AttributeReference if groupByAttrMap.contains(ar) => groupByAttrMap(ar)
    +        case a @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupByExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupByExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    --- End diff --
    
    The following version might be clearer:
    
    ```scala
    val groupingCol = groupByExprs.applyOrElse(
      idx, throw new UnsupportedOperationException(s"unsupported operator $a")
    Grouping(groupingCol)
    ```
    
    And I don't quite get the meaning of the exception error message...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54836666
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,76 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingExprs = groupByAttrMap.values.toArray
    --- End diff --
    
    `groupingExprs` is needed for generating `grouping(col)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191368381
  
    **[Test build #52331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52331/consoleFull)** for PR 11283 at commit [`6f79df1`](https://github.com/apache/spark/commit/6f79df19b57df66b810067156d271b5140df6543).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55116225
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections is composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.drop(numOriginalOutput).dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = agg.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar == gid => GroupingID(Nil)
    +        case ar: AttributeReference if groupByAttrMap.contains(ar) => groupByAttrMap(ar)
    +        case a @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupByExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupByExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    --- End diff --
    
    Here, if the value is out of boundary, I thought we should not continue the conversion. After rethinking this, users might call grouping_id() inside such a function. Maybe we should not throw any exception. How about changing it to 
    ```
              groupByExprs.lift(idx).map(Grouping).getOrElse(a)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191421128
  
    **[Test build #52331 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52331/consoleFull)** for PR 11283 at commit [`6f79df1`](https://github.com/apache/spark/commit/6f79df19b57df66b810067156d271b5140df6543).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54767979
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    --- End diff --
    
    The groupingExpressions could be expressions. For example, `GROUP BY key % 5, key - 5 WITH ROLLUP`. Thus, we are unable to do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187045768
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187689371
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51754/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191641253
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54834856
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)))
    +    val groupingExprs = groupByAttrMap.values.toArray
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    --- End diff --
    
    we can also simplify this:
    ```
    val groupingAttrSet = AttributeSet(groupByAttributes)
    val groupingSet = expand.projections.map { project =>
      project.dropRight(1).collect {
        case attr: Attribute if groupingAttrSet.contains(attr) => attr
      }.map(groupByAttrMap.get)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191599943
  
    **[Test build #52365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52365/consoleFull)** for PR 11283 at commit [`640e45c`](https://github.com/apache/spark/commit/640e45cdbbe0d08acb26af715181dc4d16dd7893).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54834446
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)))
    --- End diff --
    
    Let me clean it now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54979015
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,67 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    --- End diff --
    
    nit: we can define `val numOriginalOutput = project.child.output.length` first, and make `val groupByExprs = ...` fit it one line. Also add comment to explain the assumption here please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192078689
  
    **[Test build #52440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52440/consoleFull)** for PR 11283 at commit [`9eaca51`](https://github.com/apache/spark/commit/9eaca515a3a86f07ed4ca85ba6da080ad605d1c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54834927
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)))
    +    val groupingExprs = groupByAttrMap.values.toArray
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    --- End diff --
    
    now we are using `transformDown`, I think we don't need to pattern match `Alias` any more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54979119
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,67 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections are composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.dropRight(1).collect {
    --- End diff --
    
    nit: `project.drop(numOriginalOutput).dropRight(1)` matches our assumption better, although correctness doesn't change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54834888
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,73 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)))
    +    val groupingExprs = groupByAttrMap.values.toArray
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    +          // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +          Alias(GroupingID(Nil), name)()
    +        case a @ Alias(_ @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType), name) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupingExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    +        case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    --- End diff --
    
    we should use `groupByAttrMap` instead of `aliasMap`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54695920
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    --- End diff --
    
    we can be more strict here, use `AttributeReference`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191125502
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54981980
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,72 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the group by attributes' aliases
    --- End diff --
    
    nit: `the attributes of aliased group by expressions`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192107849
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52440/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54715072
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        Alias(GroupingID(Nil), name)()
    +      case a @ Alias(_ @ Cast(BitwiseAnd(
    +        ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +        Literal(1, IntegerType)), ByteType), name)
    +          if ar.name == VirtualColumn.groupingIdName =>
    +        // for converting an expression to its original SQL format grouping(col)
    +        val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +        val groupingCol = groupingExprs.lift(idx)
    +        if (groupingCol.isDefined) {
    +          Grouping(groupingCol.get)
    +        } else {
    +          throw new UnsupportedOperationException(s"unsupported operator $a")
    +        }
    +      case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +        aliasMap(child).child
    +      case o => o
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (plan.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    +      if (groupingSQL.isEmpty) "" else "GROUP BY",
    --- End diff --
    
    Will `groupingExprs` be empty for grouping set cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54767805
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    --- End diff --
    
    will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54978008
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -203,6 +208,10 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
           throw new UnsupportedOperationException(s"unsupported plan $node")
       }
     
    +  private def sameOutput(left: Seq[Attribute], right: Seq[Attribute]): Boolean =
    --- End diff --
    
    you can copy the code here: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L316-L318


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191893199
  
    **[Test build #52401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52401/consoleFull)** for PR 11283 at commit [`b1925e5`](https://github.com/apache/spark/commit/b1925e5feb83228db8ed8502306368a2d979e56b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54835038
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,76 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) => (a.toAttribute, a)
    +    })
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingExprs = groupByAttrMap.values.toArray
    --- End diff --
    
    this is redundant to `groupByExprs`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54769070
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    --- End diff --
    
    Will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54714755
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.filter {
    +      case _: Literal => false
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map {
    +      case a @ Alias(child: AttributeReference, name)
    +          if child.name == VirtualColumn.groupingIdName =>
    --- End diff --
    
    BTW should we follow the way we [generate](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L261-L297) these aggregate expressions? i.e.
    ```
    plan.aggregateExpressions.map { case expr =>
      expr.transformDown {
        ...
      }
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192232983
  
    LGTM except a few comments. Thanks for donging this!
    
    Would you please run `HiveCompatibilitySuite` locally and check `sql/hive/target/unit-tests.log` to confirm that all queries with grouping set are correctly generated? It takes about 10~15 min to run the suite. I found that Jenkins skipped this suite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191943266
  
    **[Test build #52400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52400/consoleFull)** for PR 11283 at commit [`5d8923c`](https://github.com/apache/spark/commit/5d8923ceb94ae5ae5b2319fc897341db1a574c2a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191595583
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54842932
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,65 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      project.dropRight(1).collect {
    --- End diff --
    
    Will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192624981
  
    A final thought about this PR:
    
    For SQL generation for grouping sets, we have to depend on a set of implicit assumptions related to implementation details of specific analysis rules, which makes the implementation tend to be fragile. I think maybe we can add an auxiliary logical plan operator `AnnotatePlan` and a expression `AnnotateExpression`, which can be used to annotate a sub-tree of a logical plan / expression to indicate their original form. These nodes should be erased by the optimizer.
    
    Haven't thought thoroughly about this, but with the help of these annotations, I'd expect it to be easier to recognize patterns corresponding to plans / expressions like grouping set, grouping, multi-distinct aggregation etc..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54821287
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    --- End diff --
    
    I think the `groupingExpressions` must be all `Attribute`, see https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L302-L305 . Did I miss something here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54822113
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,71 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: NamedExpression => a.name == VirtualColumn.groupingIdName
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    --- End diff --
    
    Ah, do some optimizer rules affect this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191581961
  
    @cloud-fan Thank you for your detailed reviews! 
    
    Actually, I have a general question about `toSQL`. When writing the prototype of this PR, I used a different way, similar to what you said above. Later, I am afraid any assumption I made is dangerous. In the future, if we introduce another external SQL/DataFrame API, which could generate similar plans. Then, a simplified solution might generate a wrong SQL. That is the reason why the commit of this PR is trying to reverse the rule of `ResolveGroupingAnalytics` step by step. 
    
    You know, multi-distinct and windows are more complex when doing SQL generation. Which way is preferred in your opinion? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-192054859
  
    @cloud-fan Really thank you for your time and your detailed reviews!!! : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54978277
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,67 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    --- End diff --
    
    nit: I think `assert` is better here, as if it breaks, it means something goes wrong in our system.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-187441400
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11283


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191595032
  
    I prefer to make corrected assumptions and write simple code. If the assumption is broken at future, we can fix it at that time. While making assumptions, we should write more `assert` so that if the assumptions are broken, we can know it at the first place.
    
    cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11283#issuecomment-191421762
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54842622
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,65 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    require(plan.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val groupByAttributes = plan.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    val groupByExprs =
    +      project.projectList.drop(project.child.output.length).map(_.asInstanceOf[Alias].child)
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map { project =>
    +      project.dropRight(1).collect {
    --- End diff --
    
    add comments to explain our assumption here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r55116229
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +236,82 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def sameOutput(output1: Seq[Attribute], output2: Seq[Attribute]): Boolean =
    +    output1.size == output2.size &&
    +      output1.zip(output2).forall(pair => pair._1.semanticEquals(pair._2))
    +
    +  private def isGroupingSet(a: Aggregate, e: Expand, p: Project): Boolean = {
    +    assert(a.child == e && e.child == p)
    +    a.groupingExpressions.forall(_.isInstanceOf[Attribute]) &&
    +      sameOutput(e.output, p.child.output ++ a.groupingExpressions.map(_.asInstanceOf[Attribute]))
    +  }
    +
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the attributes of aliased group by expressions
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliased group by expressions.
    +    val groupByExprs = project.projectList.drop(numOriginalOutput).map(_.asInstanceOf[Alias].child)
    +    val groupingSQL = groupByExprs.map(_.sql).mkString(", ")
    +
    +    // a map from group by attributes to the original group by expressions.
    +    val groupByAttrMap = AttributeMap(groupByAttributes.zip(groupByExprs))
    +
    +    val groupingSet = expand.projections.map { project =>
    +      // Assumption: expand.projections is composed of
    +      // 1) the original output (Project's child.output),
    +      // 2) group by attributes(or null literal)
    +      // 3) gid, which is always the last one in each project in Expand
    +      project.drop(numOriginalOutput).dropRight(1).collect {
    +        case attr: Attribute if groupByAttrMap.contains(attr) => groupByAttrMap(attr)
    +      }
    +    }
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    +
    +    val aggExprs = agg.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +        case ar: AttributeReference if ar == gid => GroupingID(Nil)
    +        case ar: AttributeReference if groupByAttrMap.contains(ar) => groupByAttrMap(ar)
    +        case a @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupByExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupByExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    +      }
    +    }
    +
    +    build(
    +      "SELECT",
    +      aggExprs.map(_.sql).mkString(", "),
    +      if (agg.child == OneRowRelation) "" else "FROM",
    +      toSQL(project.child),
    --- End diff --
    
    Will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54821968
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -211,6 +215,76 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      plan: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    // In cube/rollup/groupingsets, Analyzer creates new aliases for all group by expressions.
    +    // Since conversion from attribute back SQL ignore expression IDs, the alias of attribute
    +    // references are ignored in aliasMap
    +    val aliasMap = AttributeMap(project.projectList.collect {
    +      case a @ Alias(child, name) if !child.isInstanceOf[AttributeReference] => (a.toAttribute, a)
    +    })
    +
    +    val groupingExprs = plan.groupingExpressions.filterNot {
    +      // VirtualColumn.groupingIdName is added by Analyzer, and thus remove it.
    +      case a: AttributeReference => a == gid
    +      case o => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    }
    +
    +    val groupingSQL = groupingExprs.map(_.sql).mkString(", ")
    +
    +    val groupingSet = expand.projections.map(_.dropRight(1).filter {
    +      case e: Expression if plan.groupingExpressions.exists(_.semanticEquals(e)) => true
    +      case _ => false
    +    }.map {
    +      case a: AttributeReference if aliasMap.contains(a) => aliasMap(a).child
    +      case o => o
    +    })
    +
    +    val aggExprs = plan.aggregateExpressions.map { case expr =>
    +      expr.transformDown {
    +        case a @ Alias(child: AttributeReference, name) if child eq gid =>
    +          // grouping_id() is converted to VirtualColumn.groupingIdName by Analyzer. Revert it back.
    +          Alias(GroupingID(Nil), name)()
    +        case a @ Alias(_ @ Cast(BitwiseAnd(
    +            ShiftRight(ar: AttributeReference, _ @ Literal(value: Any, IntegerType)),
    +            Literal(1, IntegerType)), ByteType), name) if ar == gid =>
    +          // for converting an expression to its original SQL format grouping(col)
    +          val idx = groupingExprs.length - 1 - value.asInstanceOf[Int]
    +          val groupingCol = groupingExprs.lift(idx)
    +          if (groupingCol.isDefined) {
    +            Grouping(groupingCol.get)
    +          } else {
    +            throw new UnsupportedOperationException(s"unsupported operator $a")
    +          }
    +        case a @ Alias(child: AttributeReference, name) if aliasMap.contains(child) =>
    +          aliasMap(child).child
    +        case o => o
    +      }
    +    }
    +
    +    val groupingSetSQL =
    +      "GROUPING SETS(" +
    +        groupingSet.map(e => s"(${e.map(_.sql).mkString(", ")})").mkString(", ") + ")"
    --- End diff --
    
    Since `GROUPING SETS((year, month), (year), ())`  is valid SQL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12720] [SQL] SQL Generation Support for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11283#discussion_r54982112
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala ---
    @@ -233,6 +242,72 @@ class SQLBuilder(logicalPlan: LogicalPlan, sqlContext: SQLContext) extends Loggi
         )
       }
     
    +  private def groupingSetToSQL(
    +      agg: Aggregate,
    +      expand: Expand,
    +      project: Project): String = {
    +    assert(agg.groupingExpressions.length > 1)
    +
    +    // The last column of Expand is always grouping ID
    +    val gid = expand.output.last
    +
    +    val numOriginalOutput = project.child.output.length
    +    // Assumption: Aggregate's groupingExpressions is composed of
    +    // 1) the group by attributes' aliases
    +    // 2) gid, which is always the last one
    +    val groupByAttributes = agg.groupingExpressions.dropRight(1).map(_.asInstanceOf[Attribute])
    +    // Assumption: Project's projectList is composed of
    +    // 1) the original output (Project's child.output),
    +    // 2) the aliases of the original group by attributes, which could be expressions
    --- End diff --
    
    nit: aliased group by expressions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org