You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maryannxue <gi...@git.apache.org> on 2018/04/28 02:40:53 UTC

[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/21187

    [SPARK-24035][SQL] SQL syntax for Pivot

    ## What changes were proposed in this pull request?
    
    Add SQL support for Pivot according to Pivot grammar defined by Oracle (https://docs.oracle.com/database/121/SQLRF/img_text/pivot_clause.htm) with some simplifications, based on our existing functionality and limitations for Pivot at the backend:
    1. For pivot_for_clause (https://docs.oracle.com/database/121/SQLRF/img_text/pivot_for_clause.htm), the column list form is not supported, which means the pivot column can only be one single column.
    2. For pivot_in_clause (https://docs.oracle.com/database/121/SQLRF/img_text/pivot_in_clause.htm), the sub-query form and "ANY" is not supported (this is only supported by Oracle for XML anyway).
    3. For pivot_in_clause, aliases for the constant values are not supported.
    
    The code changes are:
    1. Add parser support for Pivot. Note that according to https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#i2076542, Pivot cannot be used together with lateral views in the from clause. This restriction has been implemented in the Parser rule.
    2. Infer group-by expressions: group-by expressions are not explicitly specified in SQL Pivot clause and need to be deduced based on this rule: https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#CHDFAFIE, so we have to post-fix it at query analysis stage.
    3. Override Pivot.resolved as "false": for the reason mentioned in [2] and the fact that output attributes change after Pivot being replaced by Project or Aggregate, we avoid resolving references until after Pivot has been resolved and replaced.
    4. Verify aggregate expressions: only aggregate expressions with or without aliases can appear in the first part of the Pivot clause, and this check is performed as analysis stage.
    
    ## How was this patch tested?
    
    A new test suite PivotSuite is added.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-24035

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21187
    
----
commit c486c6b15de49a519c728d037a8979791ea37e74
Author: maryannxue <ma...@...>
Date:   2018-04-28T01:17:52Z

    [SPARK-24035] SQL syntax for Pivot

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90024/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2735/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185260075
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -691,7 +691,9 @@ case class Pivot(
         pivotColumn: Expression,
         pivotValues: Seq[Literal],
         aggregates: Seq[Expression],
    -    child: LogicalPlan) extends UnaryNode {
    +    child: LogicalPlan,
    +    groupByExprsImplicit: Boolean = false) extends UnaryNode {
    +  override lazy val resolved = false // Pivot will be replaced after being resolved.
    --- End diff --
    
    Could you add a negative test case to show which errors we could issue when Pivot can't be resolved? If the error does not make sense to the end users, we can improve the function `checkAnalysis` in the trait `CheckAnalysis`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90094/testReport)** for PR 21187 at commit [`c7eacf5`](https://github.com/apache/spark/commit/c7eacf50ee3ea0e7da34ceb521aeb13ed5f5f84c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185258526
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -805,6 +810,7 @@ RIGHT: 'RIGHT';
     FULL: 'FULL';
     NATURAL: 'NATURAL';
     ON: 'ON';
    +PIVOT: 'PIVOT';
    --- End diff --
    
    Could you add the keywords you added here to `nonReserved` (line 723)? Also update the suite `TableIdentifierParserSuite`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    @gatorsmile "In-any-subquery in Pivot can be implemented like what we did in the other parts", can you make this clearer? The Pivot's "IN" values are special coz they will later become the columns of the relation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185273161
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -805,6 +810,7 @@ RIGHT: 'RIGHT';
     FULL: 'FULL';
     NATURAL: 'NATURAL';
     ON: 'ON';
    +PIVOT: 'PIVOT';
    --- End diff --
    
    Sure, I'll update `TableIndentifierParserSuite`. I believe I've added them to `nonReserved` already. Did I miss something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Thank you, @aray!
    Thank you, @rxin, for the nice suggestion! Changes made accordingly in my latest commit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2840/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90144/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90144/testReport)** for PR 21187 at commit [`c7eacf5`](https://github.com/apache/spark/commit/c7eacf50ee3ea0e7da34ceb521aeb13ed5f5f84c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayJoin(`
      * `case class Flatten(child: Expression) extends UnaryExpression `
      * `case class MonthsBetween(`
      * `case class CachedRDDBuilder(`
      * `case class InMemoryRelation(`
      * `case class WriteToContinuousDataSource(`
      * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, query: SparkPlan)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2787/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Would be great to make `FOR` section optional.
    E.g. - make `FOR year IN (2012, 2013)` optional in one of your examples.
    Currently `pivot()` when called programmatically, doesn't require to have list of values 
    defined up-front. 
    Otherwise it'll be a big inconvenience/limitation in `PIVOT` when called from SQL.
    
    Notice that Oracle's PIVOT when used on XML doesn't require list of values either -
    https://community.oracle.com/thread/2183084 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185278754
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -691,7 +691,9 @@ case class Pivot(
         pivotColumn: Expression,
         pivotValues: Seq[Literal],
         aggregates: Seq[Expression],
    -    child: LogicalPlan) extends UnaryNode {
    +    child: LogicalPlan,
    +    groupByExprsImplicit: Boolean = false) extends UnaryNode {
    --- End diff --
    
    Yes, I hesitated for a while... trying to figure out which way it would be clearer, using `Option` or an extra flag. I did not have a preference, so I'll do `Option` if you think it's better.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Thank you, @gatorsmile, for the review and comments! I have opened SPARK-24162, SPARK-24163 and SPARK-24164 as follow-up improvements for this issue. Please feel free to assign them to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185278179
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -691,7 +691,9 @@ case class Pivot(
         pivotColumn: Expression,
         pivotValues: Seq[Literal],
         aggregates: Seq[Expression],
    -    child: LogicalPlan) extends UnaryNode {
    +    child: LogicalPlan,
    +    groupByExprsImplicit: Boolean = false) extends UnaryNode {
    +  override lazy val resolved = false // Pivot will be replaced after being resolved.
    --- End diff --
    
    In `ResolvePivot` rule, the `Pivot` node is guaranteed to be resolved and replaced, either with a `Project` or an `Aggregate`, except when either of the two errors occur:
    1. Child expressions or child node of `Pivot` cannot be resolved, which is a general Analyzer error and not specific to Pivot. I'll add a test case for this anyway.
    2. Pivot's `aggregates` are not really aggregate expressions (not guaranteed by the Parser). I have added this check in `ResolvePivot` rule, and the last test in "pivot.sql" is dedicated for this check.
    
    So here, marking `Pivot`'s `resolved` field as false is to stop its parent from reference-resolving or star-expansion and wait till it has been replaced, otherwise the star-expansion will be incorrect.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90084/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90084/testReport)** for PR 21187 at commit [`edd0eb1`](https://github.com/apache/spark/commit/edd0eb1f879423a436c33164e8777252a9c0e1da).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90144/testReport)** for PR 21187 at commit [`c7eacf5`](https://github.com/apache/spark/commit/c7eacf50ee3ea0e7da34ceb521aeb13ed5f5f84c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by aray <gi...@git.apache.org>.
Github user aray commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    LGTM thanks for doing this!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2833/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90094/testReport)** for PR 21187 at commit [`c7eacf5`](https://github.com/apache/spark/commit/c7eacf50ee3ea0e7da34ceb521aeb13ed5f5f84c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayJoin(`
      * `case class Flatten(child: Expression) extends UnaryExpression `
      * `case class MonthsBetween(`
      * `case class CachedRDDBuilder(`
      * `case class InMemoryRelation(`
      * `case class WriteToContinuousDataSource(`
      * `case class WriteToContinuousDataSourceExec(writer: StreamWriter, query: SparkPlan)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89950/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    @maryannxue, yep, "IN ANY" is what I meant. I missed it was already in the description - thanks for clarifying this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Thanks for your fast and great work! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    In-any-subquery in Pivot can be implemented like what we did in the other parts, but let us first leave it as the future item. @maryannxue Maybe you can create a JIRA. 
    
    The Oracle-compatible syntax in this PR is good enough. Thanks for your work! @maryannxue 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    > "In-any-subquery in Pivot can be implemented like what we did in the other parts", can you make this clearer? The Pivot's "IN" values are special coz they will later become the columns of the relation.
    
    Yes. The In subquery needs to be executed before/during query analysis stage. Thus, it is different.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185259275
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -691,7 +691,9 @@ case class Pivot(
         pivotColumn: Expression,
         pivotValues: Seq[Literal],
         aggregates: Seq[Expression],
    -    child: LogicalPlan) extends UnaryNode {
    +    child: LogicalPlan,
    +    groupByExprsImplicit: Boolean = false) extends UnaryNode {
    --- End diff --
    
    Could you add add the parameter descriptions like what we did in `GroupingSets ` (line 663-675)? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2881/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90084/testReport)** for PR 21187 at commit [`edd0eb1`](https://github.com/apache/spark/commit/edd0eb1f879423a436c33164e8777252a9c0e1da).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185084802
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/PivotSuite.scala ---
    @@ -0,0 +1,197 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    --- End diff --
    
    can we use the infra for SQLQueryTestSuite?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90012/testReport)** for PR 21187 at commit [`171c0c2`](https://github.com/apache/spark/commit/171c0c27d1ed79c7df7fe32c5ac0262096315273).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21187


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90024/testReport)** for PR 21187 at commit [`171c0c2`](https://github.com/apache/spark/commit/171c0c27d1ed79c7df7fe32c5ac0262096315273).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2777/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Since the original Pivot support was added by @aray , could @aray please take a look at this PR too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185264657
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -691,7 +691,9 @@ case class Pivot(
         pivotColumn: Expression,
         pivotValues: Seq[Literal],
         aggregates: Seq[Expression],
    -    child: LogicalPlan) extends UnaryNode {
    +    child: LogicalPlan,
    +    groupByExprsImplicit: Boolean = false) extends UnaryNode {
    --- End diff --
    
    Could we just change `groupByExprs: Seq[NamedExpression]` -> `groupByExprs: Option[Seq[NamedExpression]]`? In the pattern matching, we just need to do something like Some, or None. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90094/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #89950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89950/testReport)** for PR 21187 at commit [`c486c6b`](https://github.com/apache/spark/commit/c486c6b15de49a519c728d037a8979791ea37e74).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90012/testReport)** for PR 21187 at commit [`171c0c2`](https://github.com/apache/spark/commit/171c0c27d1ed79c7df7fe32c5ac0262096315273).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90012/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #90024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90024/testReport)** for PR 21187 at commit [`171c0c2`](https://github.com/apache/spark/commit/171c0c27d1ed79c7df7fe32c5ac0262096315273).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    **[Test build #89950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89950/testReport)** for PR 21187 at commit [`c486c6b`](https://github.com/apache/spark/commit/c486c6b15de49a519c728d037a8979791ea37e74).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21187
  
    Thank you, @Tagar, for you comment! I think by saying "making FOR section optional", you actually mean to support "IN ANY".
    As you said and as I have pointed in my PR description, Oracle's "IN ANY" is only supported on XML. Besides, this version of Pivot SQL support is targeted to enable the front end that can leverage the existing runtime Pivot support, in which the values in "IN" clause are only allowed to be literals.
    That said, there's a few improvements we can make in the backend/runtime Pivot support. Some of them are easy, like "IN" value aliases; others are less so, like supporting "IN ANY". To be able to allow unspecified values, we might need to run another query at an early stage of this query's compilation, to get a list of all possible values, similar to what's been done in `RelationalGroupedDataset#pivot(String)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21187#discussion_r185641299
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -686,13 +686,26 @@ case class GroupingSets(
       override lazy val resolved: Boolean = false
     }
     
    +/**
    + * A constructor for creating a pivot, which will later be converted to a [[Project]]
    + * or an [[Aggregate]] during the query analysis.
    + *
    + * @param groupByExprsOpt A sequence of group by expressions. This field should be None if coming
    + *                        from SQL, in which group by expressions are not explicitly specified.
    + * @param pivotColumn     The pivot column.
    + * @param pivotValues     A sequence of values for the pivot column.
    + * @param aggregates      The aggregation expressions, each with or without an alias.
    + * @param child           Child operator
    + */
     case class Pivot(
    -    groupByExprs: Seq[NamedExpression],
    +    groupByExprsOpt: Option[Seq[NamedExpression]],
         pivotColumn: Expression,
         pivotValues: Seq[Literal],
         aggregates: Seq[Expression],
         child: LogicalPlan) extends UnaryNode {
    -  override def output: Seq[Attribute] = groupByExprs.map(_.toAttribute) ++ aggregates match {
    +  override lazy val resolved = false // Pivot will be replaced after being resolved.
    +  override def output: Seq[Attribute] =
    +    groupByExprsOpt.getOrElse(Seq.empty).map(_.toAttribute) ++ aggregates match {
         case agg :: Nil => pivotValues.map(value => AttributeReference(value.toString, agg.dataType)())
    --- End diff --
    
    Nit: indent issues. Let us just clean the code at the same time.
    
    ```Scala
      override def output: Seq[Attribute] = {
        val pivotAgg = aggregates match {
          case agg :: Nil =>
            pivotValues.map(value => AttributeReference(value.toString, agg.dataType)())
          case _ =>
            pivotValues.flatMap { value =>
              aggregates.map(agg => AttributeReference(value + "_" + agg.sql, agg.dataType)())
            }
        }
        groupByExprsOpt.getOrElse(Seq.empty).map(_.toAttribute) ++ pivotAgg
      }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org