You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/07/24 07:20:40 UTC

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/1567

    [SPARK-2663] [SQL] Support the Grouping Set

    Add support for `GROUPING SETS`, `ROLLUP`, `CUBE` and the the virtual column `GROUPING__ID`.
    
    More details on how to the `GROUPING SETS" can be found at: https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark grouping_sets

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1567.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1567
    
----
commit 018e7b357d2d7336a87155aa496cb5f675385cf1
Author: Cheng Hao <ha...@intel.com>
Date:   2014-07-24T00:38:37Z

    Add GroupingSet Support

commit d894f1073a8819ffbc169ede5dc4d3040e0986dd
Author: Cheng Hao <ha...@intel.com>
Date:   2014-07-24T00:58:46Z

    Update the ResolveReferences

commit 64e341f351c9fa4f0c9df1f349a89a07d9105615
Author: Cheng Hao <ha...@intel.com>
Date:   2014-07-24T03:08:02Z

    Add more comments & Unit Tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67427000
  
    Thank you @marmbrus , I've finished the updating, will add "WIP" next time. :)
    Can you review the code again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67581073
  
    @marmbrus , any more comment on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1567


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948441
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -107,6 +110,93 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
         }
       }
     
    +  object ResolveGroupingSet extends Rule[LogicalPlan] {
    +    /**
    +     * Extract attribute set according to the grouping id
    +     * @param bitmask bitmask to represent the validity of the attribute sequence
    --- End diff --
    
    I'm not sure what `valid` means here and elsewhere.  Do you mean the bitmask indicates which attributes are `selected` perhaps?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-60266002
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22073/consoleFull) for   PR 1567 at commit [`76f474e`](https://github.com/apache/spark/commit/76f474e41a172d5128f99c9ae71c7b802b9114fa).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GroupingSet(bitmasks: Seq[Int], `
      * `case class Cube(groupByExprs: Seq[Expression],`
      * `case class Rollup(groupByExprs: Seq[Expression],`
      * `case class VirtualColumn(name: String, dataType: DataType = StringType, nullable: Boolean = false)`
      * `case class GroupingSetExpansion(`
      * `case class GroupingSetExpansion(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-61599139
  
      [Test build #22866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22866/consoleFull) for   PR 1567 at commit [`dbded19`](https://github.com/apache/spark/commit/dbded19e4aa42327d2becf7d311c90d0bf73e7ef).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67289363
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24535/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21947696
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Explosive.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +@DeveloperApi
    +case class Explosive(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)(@transient sqlContext: SQLContext)
    +  extends UnaryNode {
    +
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    --- End diff --
    
    This is the default right? I don't think you need this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-63002602
  
      [Test build #23348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23348/consoleFull) for   PR 1567 at commit [`89e37d8`](https://github.com/apache/spark/commit/89e37d82d72ac614af0efcd810ac4b9b034d4253).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by harakiro <gi...@git.apache.org>.

Github user harakiro commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-75885484
  
    What version of Spark will this be released under? Is it in 1.2? Is there a Jira to track this functionality that I could reference. Thanks so much for the work on this feature!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948358
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Explosive.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +@DeveloperApi
    +case class Explosive(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)(@transient sqlContext: SQLContext)
    +  extends UnaryNode {
    +
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    +
    +  override def otherCopyArgs = sqlContext :: Nil
    +
    +  override def execute() = attachTree(this, "execute") {
    +    child.execute().mapPartitions { iter =>
    +      // TODO InterpretProjection is not Serializable
    +      val projs = projections.map(ee => newProjection(ee.children, output)).toArray
    +
    +      new Iterator[Row] {
    +        private[this] var result: Row = _
    +        private[this] var idx = -1  // initial value is set as -1
    --- End diff --
    
    Its clear that its set to `-1`. Instead maybe describe what `-1` means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67268014
  
    Okay, this is looking really good / clean.  Most of my comments are about documentation since this is a very complicated feature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-61606402
  
      [Test build #22870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22870/consoleFull) for   PR 1567 at commit [`f2fae76`](https://github.com/apache/spark/commit/f2fae76d22d4446178b43ee5b7c673adb7cbb2d1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-61610209
  
    @marmbrus @rxin I've updated the code and the PR description, can you review the code for me?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-60343575

@rxin @marmbrus , I've uploaded an draft design doc in jira. https://issues.apache.org/jira/secure/attachment/12676811/grouping_set.pdf, sorry it maybe not cover every detail, let me know if you have any confusion.

@marmbrus
>The creation of bit vectors seems like a very implementation focused physical concern. I'm curious if this could be restricted to the actual physical operator.

Yeah, It's very reasonable, I was thinking of this either.
However, the bit vectors stuff don't rely on physical execution engine, and it's slightly different with the Aggregate, which has the optimization of mapside aggregation for spark execution.

Besides, the attribute reference pass down to the parent logical operator need to be correctly set in logical plan analyzing.

Anyway, I will consider your suggestion, after all, we should keep the Logical Plan for "describing what to do", not "how to do".

>Adding a new type of attribute reference for virtual columns might be a lot of overhead. Is this really necessary?

A concrete `VirtualColumn` instance is very helpful in attribute referencing, and pattern matching, probably better than a name convention. Sorry, maybe I didn't understand your mean, we can discuss that in the code review.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948729
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -156,6 +246,10 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
           case q: LogicalPlan if q.childrenResolved =>
             logTrace(s"Attempting to resolve ${q.simpleString}")
             q transformExpressions {
    +          case u @ UnresolvedAttribute(name)
    --- End diff --
    
    `GROUPING__ID` is a virtual column which is not an output attribute of its child operator, we need to intercept it here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21949265
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Explosive.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +@DeveloperApi
    +case class Explosive(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)(@transient sqlContext: SQLContext)
    +  extends UnaryNode {
    +
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    +
    +  override def otherCopyArgs = sqlContext :: Nil
    +
    +  override def execute() = attachTree(this, "execute") {
    +    child.execute().mapPartitions { iter =>
    +      // TODO InterpretProjection is not Serializable
    +      val projs = projections.map(ee => newProjection(ee.children, output)).toArray
    --- End diff --
    
    Yes, you're right! 
    The `output` schema is not constructed within the `Explosive`, but passed in as case class argument (with the `grouping__id` attribute appended) in `Analyzer`, it should be more reasonable if we compute the `output` schema within `Explosive`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59714664
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21911/consoleFull) for   PR 1567 at commit [`88b939e`](https://github.com/apache/spark/commit/88b939e3deb15f4ed16a727b33af879fa103c913).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GroupingSet(bitmasks: Seq[Int], `
      * `case class Cube(groupByExprs: Seq[Expression],`
      * `case class Rollup(groupByExprs: Seq[Expression],`
      * `case class VirtualColumn(name: String, dataType: DataType = StringType, nullable: Boolean = false)`
      * `case class GroupingSetExpansion(`
      * `case class GroupingSetExpansion(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948133
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -243,6 +243,16 @@ abstract class LeafExpression extends Expression with trees.LeafNode[Expression]
     
     abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] {
       self: Product =>
    +}
     
    -
    +// TODO Semantically we probably not need GroupExpression
    +// It is supposed to be replaced by Projection, all we need is wrapping the
    +// Seq[Expression]
    +case class GroupExpression(children: Seq[Expression]) extends Expression {
    --- End diff --
    
    Yeah, actually I changed this for couple of times, this implementation seems the minimum impact.  There are some existing code in `transformExpression` etc., I don't want to change the code to matching `Seq[Seq[Expression]]`, which requires some of dummy change.
    
    See:
    https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala#L80


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r25312252
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +/**
    + * Apply the all of the GroupExpressions to every input row, hence we will get
    + * multiple output rows for a input row.
    + * @param projections The group of expressions, all of the group expressions should
    + *                    output the same schema specified bye the parameter `output`
    + * @param output      The output Schema
    + * @param child       Child operator
    + */
    +@DeveloperApi
    +case class Expand(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)
    +  extends UnaryNode {
    +
    +  // The GroupExpressions can output data with arbitrary partitioning, so set it
    +  // as UNKNOWN partitioning
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    +
    +  override def execute() = attachTree(this, "execute") {
    +    child.execute().mapPartitions { iter =>
    +      // TODO Move out projection objects creation and transfer to
    +      // workers via closure. However we can't assume the Projection
    +      // is serializable because of the code gen, so we have to
    +      // create the projections within each of the partition processing.
    +      val groups = projections.map(ee => newProjection(ee.children, child.output)).toArray
    +
    +      new Iterator[Row] {
    +        private[this] var result: Row = _
    +        private[this] var idx = -1  // -1 means the initial state
    +        private[this] var input: Row = _
    +
    +        override final def hasNext = (-1 < idx && idx < groups.length) || iter.hasNext
    --- End diff --
    
    i.e.
    ```scala
    private[this] val groupLength = groups.length
    ```
    and then just reference groupLength


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59706916
  
    test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r25317186
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +/**
    + * Apply the all of the GroupExpressions to every input row, hence we will get
    + * multiple output rows for a input row.
    + * @param projections The group of expressions, all of the group expressions should
    + *                    output the same schema specified bye the parameter `output`
    + * @param output      The output Schema
    + * @param child       Child operator
    + */
    +@DeveloperApi
    +case class Expand(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)
    +  extends UnaryNode {
    +
    +  // The GroupExpressions can output data with arbitrary partitioning, so set it
    +  // as UNKNOWN partitioning
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    +
    +  override def execute() = attachTree(this, "execute") {
    +    child.execute().mapPartitions { iter =>
    +      // TODO Move out projection objects creation and transfer to
    +      // workers via closure. However we can't assume the Projection
    +      // is serializable because of the code gen, so we have to
    +      // create the projections within each of the partition processing.
    +      val groups = projections.map(ee => newProjection(ee.children, child.output)).toArray
    +
    +      new Iterator[Row] {
    +        private[this] var result: Row = _
    +        private[this] var idx = -1  // -1 means the initial state
    +        private[this] var input: Row = _
    +
    +        override final def hasNext = (-1 < idx && idx < groups.length) || iter.hasNext
    --- End diff --
    
    @rxin, the `groups` is in the type of `Array`, not `Seq`, probably it does not impact the performance a lot. Anyway, thank you for pointing this out, I can update that along with some other PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67289359
  
      [Test build #24535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24535/consoleFull) for   PR 1567 at commit [`3c1df19`](https://github.com/apache/spark/commit/3c1df1987febe2aeddae7c9618baf770b4164c06).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GroupExpression(children: Seq[Expression]) extends Expression `
      * `case class Expand(`
      * `trait GroupingAnalytics extends UnaryNode `
      * `case class GroupingSets(`
      * `case class Cube(`
      * `case class Rollup(`
      * `case class Expand(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-61599770
  
      [Test build #22866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22866/consoleFull) for   PR 1567 at commit [`dbded19`](https://github.com/apache/spark/commit/dbded19e4aa42327d2becf7d311c90d0bf73e7ef).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ExecutorActor(executorId: String) extends Actor with ActorLogReceive with Logging `
      * `  case class GetActorSystemHostPortForExecutor(executorId: String) extends ToBlockManagerMaster`
      * `class UserDefinedType(DataType):`
      * `case class Explosive(`
      * `trait GroupingSets extends UnaryNode `
      * `case class GroupingSet(`
      * `case class Cube(`
      * `case class Rollup(`
      * `case class Explosive(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-60257141
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22073/consoleFull) for   PR 1567 at commit [`76f474e`](https://github.com/apache/spark/commit/76f474e41a172d5128f99c9ae71c7b802b9114fa).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948861
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -107,6 +110,93 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
         }
       }
     
    +  object ResolveGroupingSet extends Rule[LogicalPlan] {
    +    /**
    +     * Extract attribute set according to the grouping id
    +     * @param bitmask bitmask to represent the validity of the attribute sequence
    --- End diff --
    
    I think it would be clearer to change the wording then.  `invalid` sounds like something is broken.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21947710
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Explosive.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +@DeveloperApi
    +case class Explosive(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)(@transient sqlContext: SQLContext)
    +  extends UnaryNode {
    +
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    +
    +  override def otherCopyArgs = sqlContext :: Nil
    --- End diff --
    
    All SparkPlan nodes have a `sqlContext` automatically now, so I think you can remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-49979679
  
    QA tests have started for PR 1567. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17114/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67432296
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24563/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-75885663
  
    @harakiro the jira ticket is in the title of the pull request: https://issues.apache.org/jira/browse/SPARK-2663


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59708199
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21910/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-54434939
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19781/consoleFull) for   PR 1567 at commit [`0325be5`](https://github.com/apache/spark/commit/0325be5e94501620eecbd16ba4b0d42dc7ac3a8e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-54293724
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19660/consoleFull) for   PR 1567 at commit [`0325be5`](https://github.com/apache/spark/commit/0325be5e94501620eecbd16ba4b0d42dc7ac3a8e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67592527
  
    Thanks!  Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-61613020
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22870/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r25312206
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +/**
    + * Apply the all of the GroupExpressions to every input row, hence we will get
    + * multiple output rows for a input row.
    + * @param projections The group of expressions, all of the group expressions should
    + *                    output the same schema specified bye the parameter `output`
    + * @param output      The output Schema
    + * @param child       Child operator
    + */
    +@DeveloperApi
    +case class Expand(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)
    +  extends UnaryNode {
    +
    +  // The GroupExpressions can output data with arbitrary partitioning, so set it
    +  // as UNKNOWN partitioning
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    +
    +  override def execute() = attachTree(this, "execute") {
    +    child.execute().mapPartitions { iter =>
    +      // TODO Move out projection objects creation and transfer to
    +      // workers via closure. However we can't assume the Projection
    +      // is serializable because of the code gen, so we have to
    +      // create the projections within each of the partition processing.
    +      val groups = projections.map(ee => newProjection(ee.children, child.output)).toArray
    +
    +      new Iterator[Row] {
    +        private[this] var result: Row = _
    +        private[this] var idx = -1  // -1 means the initial state
    +        private[this] var input: Row = _
    +
    +        override final def hasNext = (-1 < idx && idx < groups.length) || iter.hasNext
    --- End diff --
    
    fyi you probably want to move groups.length into a variable to avoid running this everytime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67284488
  
      [Test build #24535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24535/consoleFull) for   PR 1567 at commit [`3c1df19`](https://github.com/apache/spark/commit/3c1df1987febe2aeddae7c9618baf770b4164c06).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-54307760
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19660/consoleFull) for   PR 1567 at commit [`0325be5`](https://github.com/apache/spark/commit/0325be5e94501620eecbd16ba4b0d42dc7ac3a8e).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GroupingSet(bitmasks: Seq[Int], `
      * `case class Cube(groupByExprs: Seq[Expression],`
      * `case class Rollup(groupByExprs: Seq[Expression],`
      * `protected case class AttributeEquals(val a: Attribute) `
      * `case class VirtualColumn(name: String, dataType: DataType = StringType, nullable: Boolean = false)`
      * `case class GroupingSetExpansion(`
      * `case class GroupingSetExpansion(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21947841
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -243,6 +243,16 @@ abstract class LeafExpression extends Expression with trees.LeafNode[Expression]
     
     abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] {
       self: Product =>
    +}
     
    -
    +// TODO Semantically we probably not need GroupExpression
    +// It is supposed to be replaced by Projection, all we need is wrapping the
    +// Seq[Expression]
    +case class GroupExpression(children: Seq[Expression]) extends Expression {
    --- End diff --
    
    Actually I think I'm confused about how this works.  Is the goal here essentially to have a `Seq[Seq[Expression]]` that gets transformed correctly?  If so maybe we can just make the transform function more general?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21947975
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -143,6 +143,40 @@ case class Aggregate(
       override def output = aggregateExpressions.map(_.toAttribute)
     }
     
    +case class Explosive(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: LogicalPlan) extends UnaryNode {
    +}
    +
    +trait GroupingSets extends UnaryNode {
    +  self: Product =>
    +  def gid: AttributeReference
    +  def groupByExprs: Seq[Expression]
    +  def aggregations: Seq[NamedExpression]
    +
    +  override def output = aggregations.map(_.toAttribute)
    +}
    +
    +case class GroupingSet(
    +    bitmasks: Seq[Int],
    +    groupByExprs: Seq[Expression],
    +    child: LogicalPlan,
    +    aggregations: Seq[NamedExpression],
    +    gid: AttributeReference = VirtualColumn.newGroupingId) extends GroupingSets
    +
    +case class Cube(
    +    groupByExprs: Seq[Expression],
    +    child: LogicalPlan,
    +    aggregations: Seq[NamedExpression],
    +    gid: AttributeReference = VirtualColumn.newGroupingId) extends GroupingSets
    +
    +case class Rollup(
    --- End diff --
    
    Add some scala doc here to explain what these are for and that they are transformed into `GroupingSet` by the analyzer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-61693964
  
    Thanks for working on this.  I will review after 1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59707432
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21911/consoleFull) for   PR 1567 at commit [`88b939e`](https://github.com/apache/spark/commit/88b939e3deb15f4ed16a727b33af879fa103c913).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948460
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -156,6 +246,10 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
           case q: LogicalPlan if q.childrenResolved =>
             logTrace(s"Attempting to resolve ${q.simpleString}")
             q transformExpressions {
    +          case u @ UnresolvedAttribute(name)
    --- End diff --
    
    Can you explain why this needs a special case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21947456
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -243,6 +243,16 @@ abstract class LeafExpression extends Expression with trees.LeafNode[Expression]
     
     abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] {
       self: Product =>
    +}
     
    -
    +// TODO Semantically we probably not need GroupExpression
    +// It is supposed to be replaced by Projection, all we need is wrapping the
    +// Seq[Expression]
    +case class GroupExpression(children: Seq[Expression]) extends Expression {
    --- End diff --
    
    If this is only used in analysis then lets move it to `analysis.scala` and model it after the other unresolved expression types.
    
    Can you also clarify how it is used in the scala doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-60266012
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22073/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-49968786
  
    QA tests have started for PR 1567. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17097/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59969282
  
    Yeah, please do post a design doc.  Also, sorry for not reviewing this earlier.  This will be a good feature to have.
    
    I did a quick pass and I have two high level concerns (though I did not look in much detail):
     - The creation of bit vectors seems like a very implementation focused physical concern.  I'm curious if this could be restricted to the actual physical operator.
     - Adding a new type of attribute reference for virtual columns might be a lot of overhead.  Is this really necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59714668
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21911/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-49989086
  
    QA results for PR 1567:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class GroupingSet(bitmasks: Seq[Int], <br>case class Cube(groupByExprs: Seq[Expression],<br>case class Rollup(groupByExprs: Seq[Expression],<br>case class VirtualColumn(name: String, dataType: DataType = StringType, nullable: Boolean = false)<br>case class GroupingSetExpansion(<br>case class GroupingSetExpansion(<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17114/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-61599772
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22866/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67426864
  
      [Test build #24563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24563/consoleFull) for   PR 1567 at commit [`fe65fcc`](https://github.com/apache/spark/commit/fe65fcc7f91ff9ef5b275c5f822ea659b3a54b23).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21947525
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Explosive.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +@DeveloperApi
    +case class Explosive(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)(@transient sqlContext: SQLContext)
    +  extends UnaryNode {
    +
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    +
    +  override def otherCopyArgs = sqlContext :: Nil
    +
    +  override def execute() = attachTree(this, "execute") {
    +    child.execute().mapPartitions { iter =>
    +      // TODO InterpretProjection is not Serializable
    --- End diff --
    
    Because of codegen, Projections can't be assumed to be serializable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-54389330
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948249
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -156,6 +246,10 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
           case q: LogicalPlan if q.childrenResolved =>
             logTrace(s"Attempting to resolve ${q.simpleString}")
             q transformExpressions {
    +          case u @ UnresolvedAttribute(name)
    +            if resolver(name, VirtualColumn.groupingIdName) && q.isInstanceOf[GroupingSets] =>
    --- End diff --
    
    intent 4 from `case`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59862873
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21945/consoleFull) for   PR 1567 at commit [`49b4955`](https://github.com/apache/spark/commit/49b495558c04b5a95852c843173bea0f5c8ec433).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59863019
  
    Rebased, failed in `CliSuite`.
    
    @marmbrus @rxin , not sure if you got time to review this. Sorry, it's big PR. I can provide a rough design  doc if you think that will be more helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-54449841
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19781/consoleFull) for   PR 1567 at commit [`0325be5`](https://github.com/apache/spark/commit/0325be5e94501620eecbd16ba4b0d42dc7ac3a8e).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GroupingSet(bitmasks: Seq[Int], `
      * `case class Cube(groupByExprs: Seq[Expression],`
      * `case class Rollup(groupByExprs: Seq[Expression],`
      * `protected case class AttributeEquals(val a: Attribute) `
      * `case class VirtualColumn(name: String, dataType: DataType = StringType, nullable: Boolean = false)`
      * `case class GroupingSetExpansion(`
      * `case class GroupingSetExpansion(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59866547
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21945/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21947805
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -143,6 +143,40 @@ case class Aggregate(
       override def output = aggregateExpressions.map(_.toAttribute)
     }
     
    +case class Explosive(
    --- End diff --
    
    I'd make this a verb like the other operators, maybe `Expand`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948905
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -156,6 +246,10 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
           case q: LogicalPlan if q.childrenResolved =>
             logTrace(s"Attempting to resolve ${q.simpleString}")
             q transformExpressions {
    +          case u @ UnresolvedAttribute(name)
    --- End diff --
    
    Add that in a comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67393046
  
    Cool, can you through `WIP` in the title while its being worked on?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67323755
  
    @marmbrus I still have something need to be updated, I will let you know when it's ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948206
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -243,6 +243,16 @@ abstract class LeafExpression extends Expression with trees.LeafNode[Expression]
     
     abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] {
       self: Product =>
    +}
     
    -
    +// TODO Semantically we probably not need GroupExpression
    +// It is supposed to be replaced by Projection, all we need is wrapping the
    +// Seq[Expression]
    +case class GroupExpression(children: Seq[Expression]) extends Expression {
    --- End diff --
    
    Yeah, okay, lets call this `ExpressionGroup` and explain that it is for holding `Seq[Seq[Expression]]` while doing transformations correctly.  We can try  to remove it later if we want to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59866542
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21945/consoleFull) for   PR 1567 at commit [`49b4955`](https://github.com/apache/spark/commit/49b495558c04b5a95852c843173bea0f5c8ec433).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class ReconnectWorker(masterUrl: String) extends DeployMessage`
      * `case class GroupingSet(bitmasks: Seq[Int], `
      * `case class Cube(groupByExprs: Seq[Expression],`
      * `case class Rollup(groupByExprs: Seq[Expression],`
      * `case class VirtualColumn(name: String, dataType: DataType = StringType, nullable: Boolean = false)`
      * `case class GroupingSetExpansion(`
      * `case class GroupingSetExpansion(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-54434357
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-63007406
  
      [Test build #23348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23348/consoleFull) for   PR 1567 at commit [`89e37d8`](https://github.com/apache/spark/commit/89e37d82d72ac614af0efcd810ac4b9b034d4253).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GroupExpression(children: Seq[Expression]) extends Expression `
      * `case class Explosive(`
      * `trait GroupingSets extends UnaryNode `
      * `case class GroupingSet(`
      * `case class Cube(`
      * `case class Rollup(`
      * `case class Explosive(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-67432292
  
      [Test build #24563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24563/consoleFull) for   PR 1567 at commit [`fe65fcc`](https://github.com/apache/spark/commit/fe65fcc7f91ff9ef5b275c5f822ea659b3a54b23).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class GroupExpression(children: Seq[Expression]) extends Expression `
      * `case class Expand(`
      * `trait GroupingAnalytics extends UnaryNode `
      * `case class GroupingSets(`
      * `case class Cube(`
      * `case class Rollup(`
      * `case class Expand(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-61613015
  
      [Test build #22870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22870/consoleFull) for   PR 1567 at commit [`f2fae76`](https://github.com/apache/spark/commit/f2fae76d22d4446178b43ee5b7c673adb7cbb2d1).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explosive(`
      * `trait GroupingSets extends UnaryNode `
      * `case class GroupingSet(`
      * `case class Cube(`
      * `case class Rollup(`
      * `case class Explosive(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-49968819
  
    QA results for PR 1567:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds the following public classes (experimental):<br>case class GroupingSet(bitmasks: Seq[Int], <br>case class Cube(groupByExprs: Seq[Expression],<br>case class Rollup(groupByExprs: Seq[Expression],<br>case class VirtualColumn(name: String, dataType: DataType = StringType, nullable: Boolean = false)<br>case class GroupingSetExpansion(<br>case class GroupingSetExpansion(<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17097/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948388
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Explosive.scala ---
    @@ -0,0 +1,68 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution
    +
    +import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.sql.SQLContext
    +import org.apache.spark.sql.catalyst.errors._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.physical.{UnknownPartitioning, Partitioning}
    +
    +@DeveloperApi
    +case class Explosive(
    +    projections: Seq[GroupExpression],
    +    output: Seq[Attribute],
    +    child: SparkPlan)(@transient sqlContext: SQLContext)
    +  extends UnaryNode {
    +
    +  override def outputPartitioning: Partitioning = UnknownPartitioning(0)
    +
    +  override def otherCopyArgs = sqlContext :: Nil
    +
    +  override def execute() = attachTree(this, "execute") {
    +    child.execute().mapPartitions { iter =>
    +      // TODO InterpretProjection is not Serializable
    +      val projs = projections.map(ee => newProjection(ee.children, output)).toArray
    --- End diff --
    
    I'm a little confused here.  Why are you binding to the output schema of this node and not the input schema?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-63007410
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23348/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-59967008
  
    A short design doc would be nice. Just talk about the high level design and how it is implemented. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/1567#issuecomment-54406904
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1567#discussion_r21948569
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -107,6 +110,93 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
         }
       }
     
    +  object ResolveGroupingSet extends Rule[LogicalPlan] {
    +    /**
    +     * Extract attribute set according to the grouping id
    +     * @param bitmask bitmask to represent the validity of the attribute sequence
    --- End diff --
    
    Yes, exactly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org