You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by maryannxue <gi...@git.apache.org> on 2018/07/05 22:24:18 UTC

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/21720

    [SPARK-24163][SPARK-24164][SQL] Support column list as the pivot column in Pivot

    ## What changes were proposed in this pull request?
    
    1. Bug fixing in Analyzer for Pivot node.
    2. Extend the Parser to enable parsing a column list as the pivot column.
    3. Extend the Parser and the Pivot node to enable parsing complex expressions with aliases as the pivot value.
    4. Add type check and constant check in Analyzer for Pivot node.
    
    ## How was this patch tested?
    
    Add tests in pivot.sql


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-24164

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21720
    
----
commit fd235025a2099aa1d239f4b84014301ec41c815d
Author: maryannxue <ma...@...>
Date:   2018-07-03T20:54:46Z

    spark-24164

commit 942a30dfc0fd070c067aa8d157075909610d3aaa
Author: maryannxue <ma...@...>
Date:   2018-07-05T22:23:59Z

    revert accidental changes

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200837706
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -630,11 +630,29 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
         val aggregates = Option(ctx.aggregates).toSeq
           .flatMap(_.namedExpression.asScala)
           .map(typedVisit[Expression])
    -    val pivotColumn = UnresolvedAttribute.quoted(ctx.pivotColumn.getText)
    -    val pivotValues = ctx.pivotValues.asScala.map(typedVisit[Expression]).map(Literal.apply)
    +    val pivotColumn = if (ctx.pivotColumn.identifiers.size == 1) {
    --- End diff --
    
    Are there any reasons to handle one pivot column separately? And what happens if `size == 0`? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92993/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/824/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #92823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92823/testReport)** for PR 21720 at commit [`d468821`](https://github.com/apache/spark/commit/d468821db03644c1535aa3aece55c7bcb1b211c2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #92823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92823/testReport)** for PR 21720 at commit [`d468821`](https://github.com/apache/spark/commit/d468821db03644c1535aa3aece55c7bcb1b211c2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r201137433
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ---
    @@ -630,11 +630,29 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
         val aggregates = Option(ctx.aggregates).toSeq
           .flatMap(_.namedExpression.asScala)
           .map(typedVisit[Expression])
    -    val pivotColumn = UnresolvedAttribute.quoted(ctx.pivotColumn.getText)
    -    val pivotValues = ctx.pivotValues.asScala.map(typedVisit[Expression]).map(Literal.apply)
    +    val pivotColumn = if (ctx.pivotColumn.identifiers.size == 1) {
    --- End diff --
    
    Cannot be "0" as required by the parser rule. if size == 1, then it's single column as before, otherwise it's a construct.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r201164582
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -700,7 +700,7 @@ case class GroupingSets(
     case class Pivot(
         groupByExprsOpt: Option[Seq[NamedExpression]],
         pivotColumn: Expression,
    --- End diff --
    
    No. Pivot column is one "expression" which can be either 1) a single column reference or 2) a struct of multiple columns. Either way the list of pivot values are many-to-one mapping for the pivot column.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #92993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92993/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #92792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92792/testReport)** for PR 21720 at commit [`d468821`](https://github.com/apache/spark/commit/d468821db03644c1535aa3aece55c7bcb1b211c2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93152/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200518399
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/pivot.sql ---
    @@ -88,12 +93,12 @@ PIVOT (
     );
     
     -- pivot with aliases and projection
    -SELECT 2012_s, 2013_s, 2012_a, 2013_a, c FROM (
    +SELECT firstYear_s, secondYear_s, firstYear_a, secondYear_a, c FROM (
       SELECT year y, course c, earnings e FROM courseSales
     )
     PIVOT (
       sum(e) s, avg(e) a
    -  FOR y IN (2012, 2013)
    +  FOR y IN (2012 as firstYear, 2013 secondYear)
    --- End diff --
    
    can we keep the original query? add a new one for this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200837390
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/pivot.sql.out ---
    @@ -144,51 +155,162 @@ PIVOT (
       sum(earnings * s)
       FOR course IN ('dotNET', 'Java')
     )
    --- !query 9 schema
    +-- !query 10 schema
     struct<year:int,dotNET:bigint,Java:bigint>
    --- !query 9 output
    +-- !query 10 output
     2012	15000	20000
     2013	96000	60000
     
     
    --- !query 10
    -SELECT 2012_s, 2013_s, 2012_a, 2013_a, c FROM (
    +-- !query 11
    +SELECT firstYear_s, secondYear_s, firstYear_a, secondYear_a, c FROM (
       SELECT year y, course c, earnings e FROM courseSales
     )
     PIVOT (
       sum(e) s, avg(e) a
    -  FOR y IN (2012, 2013)
    +  FOR y IN (2012 as firstYear, 2013 secondYear)
     )
    --- !query 10 schema
    -struct<2012_s:bigint,2013_s:bigint,2012_a:double,2013_a:double,c:string>
    --- !query 10 output
    +-- !query 11 schema
    +struct<firstYear_s:bigint,secondYear_s:bigint,firstYear_a:double,secondYear_a:double,c:string>
    +-- !query 11 output
     15000	48000	7500.0	48000.0	dotNET
     20000	30000	20000.0	30000.0	Java
     
     
    --- !query 11
    +-- !query 12
     SELECT * FROM courseSales
     PIVOT (
       abs(earnings)
       FOR year IN (2012, 2013)
     )
    --- !query 11 schema
    +-- !query 12 schema
     struct<>
    --- !query 11 output
    +-- !query 12 output
     org.apache.spark.sql.AnalysisException
     Aggregate expression required for pivot, found 'abs(earnings#x)';
     
     
    --- !query 12
    +-- !query 13
     SELECT * FROM (
       SELECT course, earnings FROM courseSales
     )
     PIVOT (
       sum(earnings)
       FOR year IN (2012, 2013)
     )
    --- !query 12 schema
    +-- !query 13 schema
     struct<>
    --- !query 12 output
    +-- !query 13 output
     org.apache.spark.sql.AnalysisException
     cannot resolve '`year`' given input columns: [__auto_generated_subquery_name.course, __auto_generated_subquery_name.earnings]; line 4 pos 0
    +
    +
    +-- !query 14
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, year) IN (('dotNET', 2012), ('Java', 2013))
    +)
    +-- !query 14 schema
    +struct<s:int,[dotNET, 2012]:bigint,[Java, 2013]:bigint>
    +-- !query 14 output
    +1	15000	NULL
    +2	NULL	30000
    +
    +
    +-- !query 15
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, s) IN (('dotNET', 2) as c1, ('Java', 1) as c2)
    +)
    +-- !query 15 schema
    +struct<year:int,c1:bigint,c2:bigint>
    +-- !query 15 output
    +2012	NULL	20000
    +2013	48000	NULL
    +
    +
    +-- !query 16
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, year) IN ('dotNET', 'Java')
    +)
    +-- !query 16 schema
    +struct<>
    +-- !query 16 output
    +org.apache.spark.sql.AnalysisException
    +Invalid pivot value 'dotNET': value data type string does not match pivot column data type struct<course:string,year:int>;
    +
    +
    +-- !query 17
    +SELECT * FROM courseSales
    +PIVOT (
    +  sum(earnings)
    +  FOR year IN (s, 2013)
    +)
    +-- !query 17 schema
    +struct<>
    +-- !query 17 output
    +org.apache.spark.sql.AnalysisException
    +cannot resolve '`s`' given input columns: [coursesales.course, coursesales.year, coursesales.earnings]; line 4 pos 15
    +
    +
    +-- !query 18
    +SELECT * FROM courseSales
    +PIVOT (
    +  sum(earnings)
    +  FOR year IN (course, 2013)
    +)
    +-- !query 18 schema
    +struct<>
    +-- !query 18 output
    +org.apache.spark.sql.AnalysisException
    +Literal expressions required for pivot values, found 'course#x';
    +
    +
    +-- !query 19
    +SELECT * FROM (
    +  SELECT course, year, a
    +  FROM courseSales
    +  JOIN yearsWithArray ON year = y
    +)
    +PIVOT (
    +  min(a)
    +  FOR course IN ('dotNET', 'Java')
    +)
    +-- !query 19 schema
    +struct<>
    +-- !query 19 output
    +org.apache.spark.SparkException
    +Job 17 cancelled because SparkContext was shut down
    --- End diff --
    
    Is it expected output?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200831568
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -515,13 +515,33 @@ class Analyzer(
                     s"Aggregate expression required for pivot, found '$e'")
               }
             }
    +        // Check all pivot values are literal and match pivot column data type.
    +        val evalPivotValues = pivotValues.map { value =>
    +          if (!Cast.canCast(value.dataType, pivotColumn.dataType)) {
    +            throw new AnalysisException(s"Invalid pivot value '$value': " +
    +              s"value data type ${value.dataType.simpleString} does not match " +
    +              s"pivot column data type ${pivotColumn.dataType.simpleString}")
    +          }
    +          try {
    +            Cast(value, pivotColumn.dataType).eval(EmptyRow)
    +          } catch {
    +            case _: UnsupportedOperationException =>
    --- End diff --
    
    Do not use try catch for these cases. 
    
    ```Scala
              if (value.foldable) {
                Cast(value, pivotColumn.dataType).eval(EmptyRow)
              } else {
                throw new AnalysisException(
                  s"Literal expressions required for pivot values, found '$value'")
              }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #92792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92792/testReport)** for PR 21720 at commit [`d468821`](https://github.com/apache/spark/commit/d468821db03644c1535aa3aece55c7bcb1b211c2).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    retest please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1038/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1026/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #92656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92656/testReport)** for PR 21720 at commit [`942a30d`](https://github.com/apache/spark/commit/942a30dfc0fd070c067aa8d157075909610d3aaa).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200831720
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -559,8 +574,8 @@ class Analyzer(
               Project(groupByExprsAttr ++ pivotOutputs, secondAgg)
             } else {
               val pivotAggregates: Seq[NamedExpression] = pivotValues.flatMap { value =>
    -            def ifExpr(expr: Expression) = {
    -              If(EqualNullSafe(pivotColumn, value), expr, Literal(null))
    +            def ifExpr(e: Expression) = {
    +              If(EqualNullSafe(pivotColumn, Cast(value, pivotColumn.dataType)), e, Literal(null))
    --- End diff --
    
    need to consider timezone. `Cast(value, pivotColumn.dataType, Some(conf.sessionLocalTimeZone))`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200831755
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -515,13 +515,33 @@ class Analyzer(
                     s"Aggregate expression required for pivot, found '$e'")
               }
             }
    +        // Check all pivot values are literal and match pivot column data type.
    +        val evalPivotValues = pivotValues.map { value =>
    +          if (!Cast.canCast(value.dataType, pivotColumn.dataType)) {
    +            throw new AnalysisException(s"Invalid pivot value '$value': " +
    +              s"value data type ${value.dataType.simpleString} does not match " +
    +              s"pivot column data type ${pivotColumn.dataType.simpleString}")
    +          }
    +          try {
    +            Cast(value, pivotColumn.dataType).eval(EmptyRow)
    +          } catch {
    +            case _: UnsupportedOperationException =>
    --- End diff --
    
    We should check if the value is foldable before the type is castable


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92792/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1057/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/710/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #93138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93138/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/806/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93182/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200838038
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -515,13 +515,33 @@ class Analyzer(
                     s"Aggregate expression required for pivot, found '$e'")
               }
             }
    +        // Check all pivot values are literal and match pivot column data type.
    +        val evalPivotValues = pivotValues.map { value =>
    +          if (!Cast.canCast(value.dataType, pivotColumn.dataType)) {
    +            throw new AnalysisException(s"Invalid pivot value '$value': " +
    +              s"value data type ${value.dataType.simpleString} does not match " +
    +              s"pivot column data type ${pivotColumn.dataType.simpleString}")
    +          }
    +          try {
    +            Cast(value, pivotColumn.dataType).eval(EmptyRow)
    +          } catch {
    +            case _: UnsupportedOperationException =>
    +              throw new AnalysisException(
    +                s"Literal expressions required for pivot values, found '$value'")
    --- End diff --
    
    Is `UnsupportedOperationException` raised only in the case if `value` is not a literal. Probably you can check that it is a literal earlier?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92656/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92823/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    LGTM
    
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #93138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93138/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/943/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    ping @maryannxue Resolve the conflicts? Will review it again after that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r201136969
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/pivot.sql.out ---
    @@ -144,51 +155,162 @@ PIVOT (
       sum(earnings * s)
       FOR course IN ('dotNET', 'Java')
     )
    --- !query 9 schema
    +-- !query 10 schema
     struct<year:int,dotNET:bigint,Java:bigint>
    --- !query 9 output
    +-- !query 10 output
     2012	15000	20000
     2013	96000	60000
     
     
    --- !query 10
    -SELECT 2012_s, 2013_s, 2012_a, 2013_a, c FROM (
    +-- !query 11
    +SELECT firstYear_s, secondYear_s, firstYear_a, secondYear_a, c FROM (
       SELECT year y, course c, earnings e FROM courseSales
     )
     PIVOT (
       sum(e) s, avg(e) a
    -  FOR y IN (2012, 2013)
    +  FOR y IN (2012 as firstYear, 2013 secondYear)
     )
    --- !query 10 schema
    -struct<2012_s:bigint,2013_s:bigint,2012_a:double,2013_a:double,c:string>
    --- !query 10 output
    +-- !query 11 schema
    +struct<firstYear_s:bigint,secondYear_s:bigint,firstYear_a:double,secondYear_a:double,c:string>
    +-- !query 11 output
     15000	48000	7500.0	48000.0	dotNET
     20000	30000	20000.0	30000.0	Java
     
     
    --- !query 11
    +-- !query 12
     SELECT * FROM courseSales
     PIVOT (
       abs(earnings)
       FOR year IN (2012, 2013)
     )
    --- !query 11 schema
    +-- !query 12 schema
     struct<>
    --- !query 11 output
    +-- !query 12 output
     org.apache.spark.sql.AnalysisException
     Aggregate expression required for pivot, found 'abs(earnings#x)';
     
     
    --- !query 12
    +-- !query 13
     SELECT * FROM (
       SELECT course, earnings FROM courseSales
     )
     PIVOT (
       sum(earnings)
       FOR year IN (2012, 2013)
     )
    --- !query 12 schema
    +-- !query 13 schema
     struct<>
    --- !query 12 output
    +-- !query 13 output
     org.apache.spark.sql.AnalysisException
     cannot resolve '`year`' given input columns: [__auto_generated_subquery_name.course, __auto_generated_subquery_name.earnings]; line 4 pos 0
    +
    +
    +-- !query 14
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, year) IN (('dotNET', 2012), ('Java', 2013))
    +)
    +-- !query 14 schema
    +struct<s:int,[dotNET, 2012]:bigint,[Java, 2013]:bigint>
    +-- !query 14 output
    +1	15000	NULL
    +2	NULL	30000
    +
    +
    +-- !query 15
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, s) IN (('dotNET', 2) as c1, ('Java', 1) as c2)
    +)
    +-- !query 15 schema
    +struct<year:int,c1:bigint,c2:bigint>
    +-- !query 15 output
    +2012	NULL	20000
    +2013	48000	NULL
    +
    +
    +-- !query 16
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, year) IN ('dotNET', 'Java')
    +)
    +-- !query 16 schema
    +struct<>
    +-- !query 16 output
    +org.apache.spark.sql.AnalysisException
    +Invalid pivot value 'dotNET': value data type string does not match pivot column data type struct<course:string,year:int>;
    +
    +
    +-- !query 17
    +SELECT * FROM courseSales
    +PIVOT (
    +  sum(earnings)
    +  FOR year IN (s, 2013)
    +)
    +-- !query 17 schema
    +struct<>
    +-- !query 17 output
    +org.apache.spark.sql.AnalysisException
    +cannot resolve '`s`' given input columns: [coursesales.course, coursesales.year, coursesales.earnings]; line 4 pos 15
    +
    +
    +-- !query 18
    +SELECT * FROM courseSales
    +PIVOT (
    +  sum(earnings)
    +  FOR year IN (course, 2013)
    +)
    +-- !query 18 schema
    +struct<>
    +-- !query 18 output
    +org.apache.spark.sql.AnalysisException
    +Literal expressions required for pivot values, found 'course#x';
    +
    +
    +-- !query 19
    +SELECT * FROM (
    +  SELECT course, year, a
    +  FROM courseSales
    +  JOIN yearsWithArray ON year = y
    +)
    +PIVOT (
    +  min(a)
    +  FOR course IN ('dotNET', 'Java')
    +)
    +-- !query 19 schema
    +struct<>
    +-- !query 19 output
    +org.apache.spark.SparkException
    +Job 17 cancelled because SparkContext was shut down
    --- End diff --
    
    No... sorry about this. There must have been a mistake. I'll commit this file again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #92993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92993/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #93152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93152/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200837087
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -506,7 +506,7 @@ class Analyzer(
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
           case p: Pivot if !p.childrenResolved || !p.aggregates.forall(_.resolved)
             || (p.groupByExprsOpt.isDefined && !p.groupByExprsOpt.get.forall(_.resolved))
    -        || !p.pivotColumn.resolved => p
    +        || !p.pivotColumn.resolved || !p.pivotValues.forall(_.resolved) => p
    --- End diff --
    
    By which test is the change covered?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r201137638
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -515,13 +515,33 @@ class Analyzer(
                     s"Aggregate expression required for pivot, found '$e'")
               }
             }
    +        // Check all pivot values are literal and match pivot column data type.
    +        val evalPivotValues = pivotValues.map { value =>
    +          if (!Cast.canCast(value.dataType, pivotColumn.dataType)) {
    +            throw new AnalysisException(s"Invalid pivot value '$value': " +
    +              s"value data type ${value.dataType.simpleString} does not match " +
    +              s"pivot column data type ${pivotColumn.dataType.simpleString}")
    +          }
    +          try {
    +            Cast(value, pivotColumn.dataType).eval(EmptyRow)
    +          } catch {
    +            case _: UnsupportedOperationException =>
    +              throw new AnalysisException(
    +                s"Literal expressions required for pivot values, found '$value'")
    --- End diff --
    
    Yes, you are right. Please refer to @gatorsmile's comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    @gatorsmile @maryannxue Can we move forward with this PR: https://github.com/apache/spark/pull/21699 ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r201136705
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -506,7 +506,7 @@ class Analyzer(
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
           case p: Pivot if !p.childrenResolved || !p.aggregates.forall(_.resolved)
             || (p.groupByExprsOpt.isDefined && !p.groupByExprsOpt.get.forall(_.resolved))
    -        || !p.pivotColumn.resolved => p
    +        || !p.pivotColumn.resolved || !p.pivotValues.forall(_.resolved) => p
    --- End diff --
    
    Before this PR, pivot values can only be single literals (no struct) so they have been converted to Literals in ASTBuilder. Now they are "expressions" and will be handled in this Analyzer rule.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93138/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200837002
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -414,7 +414,16 @@ groupingSet
         ;
     
     pivotClause
    -    : PIVOT '(' aggregates=namedExpressionSeq FOR pivotColumn=identifier IN '(' pivotValues+=constant (',' pivotValues+=constant)* ')' ')'
    +    : PIVOT '(' aggregates=namedExpressionSeq FOR pivotColumn IN '(' pivotValues+=pivotValue (',' pivotValues+=pivotValue)* ')' ')'
    +    ;
    +
    +pivotColumn
    +    : identifiers+=identifier
    +    | '(' identifiers+=identifier (',' identifiers+=identifier)* ')'
    --- End diff --
    
    Are there any specific reasons to restrict the `pivotColumn` by `identifier`? Are there any cases when expressions still don't supported properly with your changes?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21720


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200829969
  
    --- Diff: sql/core/src/test/resources/sql-tests/results/pivot.sql.out ---
    @@ -144,51 +155,162 @@ PIVOT (
       sum(earnings * s)
       FOR course IN ('dotNET', 'Java')
     )
    --- !query 9 schema
    +-- !query 10 schema
     struct<year:int,dotNET:bigint,Java:bigint>
    --- !query 9 output
    +-- !query 10 output
     2012	15000	20000
     2013	96000	60000
     
     
    --- !query 10
    -SELECT 2012_s, 2013_s, 2012_a, 2013_a, c FROM (
    +-- !query 11
    +SELECT firstYear_s, secondYear_s, firstYear_a, secondYear_a, c FROM (
       SELECT year y, course c, earnings e FROM courseSales
     )
     PIVOT (
       sum(e) s, avg(e) a
    -  FOR y IN (2012, 2013)
    +  FOR y IN (2012 as firstYear, 2013 secondYear)
     )
    --- !query 10 schema
    -struct<2012_s:bigint,2013_s:bigint,2012_a:double,2013_a:double,c:string>
    --- !query 10 output
    +-- !query 11 schema
    +struct<firstYear_s:bigint,secondYear_s:bigint,firstYear_a:double,secondYear_a:double,c:string>
    +-- !query 11 output
     15000	48000	7500.0	48000.0	dotNET
     20000	30000	20000.0	30000.0	Java
     
     
    --- !query 11
    +-- !query 12
     SELECT * FROM courseSales
     PIVOT (
       abs(earnings)
       FOR year IN (2012, 2013)
     )
    --- !query 11 schema
    +-- !query 12 schema
     struct<>
    --- !query 11 output
    +-- !query 12 output
     org.apache.spark.sql.AnalysisException
     Aggregate expression required for pivot, found 'abs(earnings#x)';
     
     
    --- !query 12
    +-- !query 13
     SELECT * FROM (
       SELECT course, earnings FROM courseSales
     )
     PIVOT (
       sum(earnings)
       FOR year IN (2012, 2013)
     )
    --- !query 12 schema
    +-- !query 13 schema
     struct<>
    --- !query 12 output
    +-- !query 13 output
     org.apache.spark.sql.AnalysisException
     cannot resolve '`year`' given input columns: [__auto_generated_subquery_name.course, __auto_generated_subquery_name.earnings]; line 4 pos 0
    +
    +
    +-- !query 14
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, year) IN (('dotNET', 2012), ('Java', 2013))
    +)
    +-- !query 14 schema
    +struct<s:int,[dotNET, 2012]:bigint,[Java, 2013]:bigint>
    +-- !query 14 output
    +1	15000	NULL
    +2	NULL	30000
    +
    +
    +-- !query 15
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, s) IN (('dotNET', 2) as c1, ('Java', 1) as c2)
    +)
    +-- !query 15 schema
    +struct<year:int,c1:bigint,c2:bigint>
    +-- !query 15 output
    +2012	NULL	20000
    +2013	48000	NULL
    +
    +
    +-- !query 16
    +SELECT * FROM (
    +  SELECT course, year, earnings, s
    +  FROM courseSales
    +  JOIN years ON year = y
    +)
    +PIVOT (
    +  sum(earnings)
    +  FOR (course, year) IN ('dotNET', 'Java')
    +)
    +-- !query 16 schema
    +struct<>
    +-- !query 16 output
    +org.apache.spark.sql.AnalysisException
    +Invalid pivot value 'dotNET': value data type string does not match pivot column data type struct<course:string,year:int>;
    +
    +
    +-- !query 17
    +SELECT * FROM courseSales
    +PIVOT (
    +  sum(earnings)
    +  FOR year IN (s, 2013)
    +)
    +-- !query 17 schema
    +struct<>
    +-- !query 17 output
    +org.apache.spark.sql.AnalysisException
    +cannot resolve '`s`' given input columns: [coursesales.course, coursesales.year, coursesales.earnings]; line 4 pos 15
    +
    +
    +-- !query 18
    +SELECT * FROM courseSales
    +PIVOT (
    +  sum(earnings)
    +  FOR year IN (course, 2013)
    +)
    +-- !query 18 schema
    +struct<>
    +-- !query 18 output
    +org.apache.spark.sql.AnalysisException
    +Literal expressions required for pivot values, found 'course#x';
    +
    +
    +-- !query 19
    +SELECT * FROM (
    +  SELECT course, year, a
    +  FROM courseSales
    +  JOIN yearsWithArray ON year = y
    +)
    +PIVOT (
    +  min(a)
    +  FOR course IN ('dotNET', 'Java')
    +)
    +-- !query 19 schema
    +struct<>
    +-- !query 19 output
    +org.apache.spark.SparkException
    +Job 17 cancelled because SparkContext was shut down
    +
    +
    +-- !query 20
    +SELECT * FROM (
    +  SELECT course, year, y, a
    +  FROM courseSales
    +  JOIN yearsWithArray ON year = y
    +)
    +PIVOT (
    +  max(a)
    +  FOR (y, course) IN ((2012, 'dotNET'), (2013, 'Java'))
    +)
    +-- !query 20 schema
    +struct<>
    +-- !query 20 output
    +org.apache.spark.SparkException
    +Exception thrown in awaitResult:
    --- End diff --
    
    ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #93182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93182/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #93182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93182/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #93152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93152/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r201094872
  
    --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
    @@ -414,7 +414,16 @@ groupingSet
         ;
     
     pivotClause
    -    : PIVOT '(' aggregates=namedExpressionSeq FOR pivotColumn=identifier IN '(' pivotValues+=constant (',' pivotValues+=constant)* ')' ')'
    +    : PIVOT '(' aggregates=namedExpressionSeq FOR pivotColumn IN '(' pivotValues+=pivotValue (',' pivotValues+=pivotValue)* ')' ')'
    +    ;
    +
    +pivotColumn
    +    : identifiers+=identifier
    +    | '(' identifiers+=identifier (',' identifiers+=identifier)* ')'
    --- End diff --
    
    The main reason was that I implemented this pivot SQL support based on ORACLE grammar. Please take a look at https://docs.oracle.com/database/121/SQLRF/img_text/pivot_for_clause.htm. Note that the "column" here is different from "expression" (take this for reference: https://docs.oracle.com/cd/B28359_01/server.111/b28286/expressions002.htm#SQLRF52047).
    Another reason was that relaxing it to an "expr" would require a lot more tests and handling of special cases.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200830835
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -515,13 +515,33 @@ class Analyzer(
                     s"Aggregate expression required for pivot, found '$e'")
               }
             }
    +        // Check all pivot values are literal and match pivot column data type.
    +        val evalPivotValues = pivotValues.map { value =>
    +          if (!Cast.canCast(value.dataType, pivotColumn.dataType)) {
    +            throw new AnalysisException(s"Invalid pivot value '$value': " +
    +              s"value data type ${value.dataType.simpleString} does not match " +
    --- End diff --
    
    `simpleString` -> `catalogString`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200830672
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -515,13 +515,33 @@ class Analyzer(
                     s"Aggregate expression required for pivot, found '$e'")
    --- End diff --
    
    Add a test case for this exception?
    ```SQL
    SELECT * FROM (
      SELECT year, course, earnings FROM courseSales
    )
    PIVOT (
      sum(earnings), year
      FOR course IN ('dotNET', 'Java')
    )
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    **[Test build #92656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92656/testReport)** for PR 21720 at commit [`942a30d`](https://github.com/apache/spark/commit/942a30dfc0fd070c067aa8d157075909610d3aaa).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by maryannxue <gi...@git.apache.org>.

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r201194441
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -559,8 +574,8 @@ class Analyzer(
               Project(groupByExprsAttr ++ pivotOutputs, secondAgg)
             } else {
               val pivotAggregates: Seq[NamedExpression] = pivotValues.flatMap { value =>
    -            def ifExpr(expr: Expression) = {
    -              If(EqualNullSafe(pivotColumn, value), expr, Literal(null))
    +            def ifExpr(e: Expression) = {
    +              If(EqualNullSafe(pivotColumn, Cast(value, pivotColumn.dataType)), e, Literal(null))
    --- End diff --
    
    Is it required in the other Cast(value, pivotColumn.dataType) above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21720#discussion_r200837317
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -700,7 +700,7 @@ case class GroupingSets(
     case class Pivot(
         groupByExprsOpt: Option[Seq[NamedExpression]],
         pivotColumn: Expression,
    --- End diff --
    
    I am asking just for my understanding. If you support multiple pivot columns, why it is not declared here explicitly: `pivotColumns: Seq[Expression]` like for `pivotValues`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21720
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org