You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/09/02 13:34:36 UTC

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22316

    [SPARK-25048][SQL] Pivoting by multiple columns in Scala/Java

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to extend implementation of existing method:
    ```
    def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset
    ```
    to support values of the struct type. This allows pivoting by multiple columns combined by `struct`:
    ```
    trainingSales
          .groupBy($"sales.year")
          .pivot(
            pivotColumn = struct(lower($"sales.course"), $"training"),
            values = Seq(
              struct(lit("dotnet"), lit("Experts")),
              struct(lit("java"), lit("Dummies")))
          ).agg(sum($"sales.earnings"))
    ```
    
    ## How was this patch tested?
    
    Added a test for values specified via `struct` in Java and Scala.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 pivoting-by-multiple-columns2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22316.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22316
    
----
commit 058072544fdd606392a57615119bb55dff5345c0
Author: Maxim Gekk <ma...@...>
Date:   2018-09-02T12:24:20Z

    Support columns as values

commit 1221db39b75a9b9bd4fbc6144150283d9c24e9d5
Author: Maxim Gekk <ma...@...>
Date:   2018-09-02T13:14:55Z

    Added a test for the case when values are not specified

commit a097b294854f99ec58ca307d85c19e54cd76d6b8
Author: Maxim Gekk <ma...@...>
Date:   2018-09-02T13:19:14Z

    Added a test for Java

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Looks good but I wonder if all guys are happy with that involved in the previous PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    @cloud-fan Thank you for the suggestion. I did it in this way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214842133
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
    @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext {
     
         assert(exception.getMessage.contains("aggregate functions are not allowed"))
       }
    +
    +  test("pivoting column list with values") {
    +    val expected = Row(2012, 10000.0, null) :: Row(2013, 48000.0, 30000.0) :: Nil
    +    val df = trainingSales
    +      .groupBy($"sales.year")
    +      .pivot(struct(lower($"sales.course"), $"training"), Seq(
    +        struct(lit("dotnet"), lit("Experts")),
    +        struct(lit("java"), lit("Dummies")))
    +      ).agg(sum($"sales.earnings"))
    +
    +    checkAnswer(df, expected)
    +  }
    +
    +  test("pivoting column list") {
    +    val exception = intercept[RuntimeException] {
    +      trainingSales
    +        .groupBy($"sales.year")
    +        .pivot(struct(lower($"sales.course"), $"training"))
    +        .agg(sum($"sales.earnings"))
    +        .collect()
    --- End diff --
    
    > I miss something?
    
    No, you don't. The exception for sure is thrown inside of `lit` because `collect()` returns a complex value which cannot be "wrapped" by lit. This is exactly checked in the test which I added to show existing behavior.
    
    > btw, IMHO AnalysisException is better than RuntimeException in this case?
    
    @maropu Could you explain, please, why do you think `AnalysisException` is better for the error occurs in run-time?
    
    Just in case, in the PR, I don't aim to change behavior of existing method: `def pivot(pivotColumn: Column): RelationalGroupedDataset`. I believe it should be discussed separately regarding to needs for changing user visible behavior.  The PR aims to improve `def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset` to allow users to specify `struct` literals in particular. Please, see the description.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r217251795
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
             new RelationalGroupedDataset(
               df,
               groupingExprs,
    -          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply)))
    +          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr)))
    --- End diff --
    
    @MaxGekk, just for doubly doubly sure, shell we `Try(...).getOrElse(lit(...).expr)`? Looks at least there's one case of a potential behaviour change about scale and precision.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96420/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r221427994
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -330,6 +330,15 @@ class RelationalGroupedDataset protected[sql](
        *   df.groupBy("year").pivot("course").sum("earnings")
        * }}}
        *
    +   * From Spark 2.5.0, values can be literal columns, for instance, struct. For pivoting by
    +   * multiple columns, use the `struct` function to combine the columns and values:
    +   *
    +   * {{{
    +   *   df.groupBy($"year")
    --- End diff --
    
    Why cannot be grouping by `Column` type?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #95592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95592/testReport)** for PR 22316 at commit [`ef8e22a`](https://github.com/apache/spark/commit/ef8e22abb29da7a9b1e2ed7c1f237347d7c56e50).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #95590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95590/testReport)** for PR 22316 at commit [`a097b29`](https://github.com/apache/spark/commit/a097b294854f99ec58ca307d85c19e54cd76d6b8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96409/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96759/testReport)** for PR 22316 at commit [`d645d06`](https://github.com/apache/spark/commit/d645d06df4520522c6eae9dad2d75c9ed73a3e39).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r219368791
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
             new RelationalGroupedDataset(
               df,
               groupingExprs,
    -          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply)))
    +          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr)))
    --- End diff --
    
    from a quick look, seems `Literal.create` is more powerful and should not have regressions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    > LGTM if the decimal precision concern from @HyukjinKwon is addressed.
    
    @HyukjinKwon Do you expect special tests for decimals? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214566503
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
    @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext {
     
         assert(exception.getMessage.contains("aggregate functions are not allowed"))
       }
    +
    +  test("pivoting column list with values") {
    +    val expected = Row(2012, 10000.0, null) :: Row(2013, 48000.0, 30000.0) :: Nil
    +    val df = trainingSales
    +      .groupBy($"sales.year")
    +      .pivot(struct(lower($"sales.course"), $"training"), Seq(
    +        struct(lit("dotnet"), lit("Experts")),
    +        struct(lit("java"), lit("Dummies")))
    +      ).agg(sum($"sales.earnings"))
    +
    +    checkAnswer(df, expected)
    +  }
    +
    +  test("pivoting column list") {
    +    val exception = intercept[RuntimeException] {
    +      trainingSales
    +        .groupBy($"sales.year")
    +        .pivot(struct(lower($"sales.course"), $"training"))
    +        .agg(sum($"sales.earnings"))
    +        .collect()
    --- End diff --
    
    Don't need this `.collect()` to cactch `RuntimeException`? btw, IMHO `AnalysisException` is better than `RuntimeException` in this case? Can't we?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    I could check it by myself but it would take some time since I'm kind of busy for now :-( 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214761811
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
    @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext {
     
         assert(exception.getMessage.contains("aggregate functions are not allowed"))
       }
    +
    +  test("pivoting column list with values") {
    +    val expected = Row(2012, 10000.0, null) :: Row(2013, 48000.0, 30000.0) :: Nil
    +    val df = trainingSales
    +      .groupBy($"sales.year")
    +      .pivot(struct(lower($"sales.course"), $"training"), Seq(
    +        struct(lit("dotnet"), lit("Experts")),
    +        struct(lit("java"), lit("Dummies")))
    +      ).agg(sum($"sales.earnings"))
    +
    +    checkAnswer(df, expected)
    +  }
    +
    +  test("pivoting column list") {
    +    val exception = intercept[RuntimeException] {
    +      trainingSales
    +        .groupBy($"sales.year")
    +        .pivot(struct(lower($"sales.course"), $"training"))
    +        .agg(sum($"sales.earnings"))
    +        .collect()
    --- End diff --
    
    I tried in your branch;
    ```
    scala> df.show
    +--------+--------------------+
    |training|               sales|
    +--------+--------------------+
    | Experts|[dotNET, 2012, 10...|
    | Experts|[JAVA, 2012, 2000...|
    | Dummies|[dotNet, 2012, 50...|
    | Experts|[dotNET, 2013, 48...|
    | Dummies|[Java, 2013, 3000...|
    +--------+--------------------+
    
    scala> df.groupBy($"sales.year").pivot(struct(lower($"sales.course"), $"training")).agg(sum($"sales.earnings"))
    java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema [dotnet,Dummies]
      at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
      at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164)
      at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164)
      at scala.util.Try.getOrElse(Try.scala:79)
      at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:163)
      at org.apache.spark.sql.functions$.typedLit(functions.scala:127)
    ```
    I miss something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96800/testReport)** for PR 22316 at commit [`43972ef`](https://github.com/apache/spark/commit/43972ef4461451b346a0a4ba7191a8c7ed00afb9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96759/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96409/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22316


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95592/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r219368334
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -330,6 +331,15 @@ class RelationalGroupedDataset protected[sql](
        *   df.groupBy("year").pivot("course").sum("earnings")
        * }}}
        *
    +   * From Spark 3.0.0, values can be literal columns, for instance, struct. For pivoting by
    --- End diff --
    
    3.0.0 => 2.5.0


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96520/testReport)** for PR 22316 at commit [`49b47fb`](https://github.com/apache/spark/commit/49b47fbbe3b78aa2ee3703d89af483052a02e33b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96800/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214544896
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -406,6 +407,15 @@ class RelationalGroupedDataset protected[sql](
        *   df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings")
        * }}}
        *
    +   * For pivoting by multiple columns, use the `struct` function to combine the columns and values:
    +   *
    +   * {{{
    +   *   df
    +   *     .groupBy($"year")
    --- End diff --
    
    I would make this line up


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96404/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r221428797
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -330,6 +330,15 @@ class RelationalGroupedDataset protected[sql](
        *   df.groupBy("year").pivot("course").sum("earnings")
        * }}}
        *
    +   * From Spark 2.5.0, values can be literal columns, for instance, struct. For pivoting by
    +   * multiple columns, use the `struct` function to combine the columns and values:
    +   *
    +   * {{{
    +   *   df.groupBy($"year")
    --- End diff --
    
    we can. just to match the examples with above except the difference. really not a big deal at all.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96800/testReport)** for PR 22316 at commit [`43972ef`](https://github.com/apache/spark/commit/43972ef4461451b346a0a4ba7191a8c7ed00afb9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96404/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95631/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    @HyukjinKwon @maropu @jaceklaskowski Please, take a look at this PR one more time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    I'm merging this. Last change is comment change and lint / unidoc check passed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Can you just investigate if there's behaviour change about decimal precision? If there is, can you add a simple test if that's a better behaviour? If that's not a better behaviour, let's try-catch for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    One safe change is to not use the `lit` function, but to do a manual pattern match and still use `Literal.apply`. We can investigate `Literal.create` in a followup


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96404/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95829/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96520/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by jaceklaskowski <gi...@git.apache.org>.

Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214752855
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
             new RelationalGroupedDataset(
               df,
               groupingExprs,
    -          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply)))
    +          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr)))
    --- End diff --
    
    What do you think about `map(lit).map(_.expr)` instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #95631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95631/testReport)** for PR 22316 at commit [`673ef00`](https://github.com/apache/spark/commit/673ef001adf9b64d644c782eed2aefecc029ed81).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r217476618
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
             new RelationalGroupedDataset(
               df,
               groupingExprs,
    -          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply)))
    +          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr)))
    --- End diff --
    
    > Looks at least there's one case of a potential behaviour change about scale and precision.
    
    Could you explain, please. Why do you expect some behavior change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96409/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #95592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95592/testReport)** for PR 22316 at commit [`ef8e22a`](https://github.com/apache/spark/commit/ef8e22abb29da7a9b1e2ed7c1f237347d7c56e50).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214722485
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
    @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext {
     
         assert(exception.getMessage.contains("aggregate functions are not allowed"))
       }
    +
    +  test("pivoting column list with values") {
    +    val expected = Row(2012, 10000.0, null) :: Row(2013, 48000.0, 30000.0) :: Nil
    +    val df = trainingSales
    +      .groupBy($"sales.year")
    +      .pivot(struct(lower($"sales.course"), $"training"), Seq(
    +        struct(lit("dotnet"), lit("Experts")),
    +        struct(lit("java"), lit("Dummies")))
    +      ).agg(sum($"sales.earnings"))
    +
    +    checkAnswer(df, expected)
    +  }
    +
    +  test("pivoting column list") {
    +    val exception = intercept[RuntimeException] {
    +      trainingSales
    +        .groupBy($"sales.year")
    +        .pivot(struct(lower($"sales.course"), $"training"))
    +        .agg(sum($"sales.earnings"))
    +        .collect()
    --- End diff --
    
    My changes don't throw the exception. It is thrown in the collect() : https://github.com/apache/spark/blob/41c2227a2318029709553a588e44dee28f106350/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala#L385
    
    @maropu Do you propose to catch `RuntimeException` and replace it by `AnalysisException`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214754379
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
             new RelationalGroupedDataset(
               df,
               groupingExprs,
    -          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply)))
    +          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr)))
    --- End diff --
    
    Don't see any advantages of this. It is longer and slower.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96420/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96759/testReport)** for PR 22316 at commit [`d645d06`](https://github.com/apache/spark/commit/d645d06df4520522c6eae9dad2d75c9ed73a3e39).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #95631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95631/testReport)** for PR 22316 at commit [`673ef00`](https://github.com/apache/spark/commit/673ef001adf9b64d644c782eed2aefecc029ed81).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    LGTM otherwise


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    At least @gatorsmile and @cloud-fan, WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Seems fine to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    @HyukjinKwon May I ask you to look at the PR. Is there anything which blocks the PR for now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r216122957
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -330,6 +331,15 @@ class RelationalGroupedDataset protected[sql](
        *   df.groupBy("year").pivot("course").sum("earnings")
        * }}}
        *
    +   * From Spark 2.4.0, values can be literal columns, for instance, struct. For pivoting by
    --- End diff --
    
    Let's target 3.0.0 @MaxGekk.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    LGTM if the decimal precision concern from @HyukjinKwon is addressed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96420/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214894498
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
    @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with SharedSQLContext {
     
         assert(exception.getMessage.contains("aggregate functions are not allowed"))
       }
    +
    +  test("pivoting column list with values") {
    +    val expected = Row(2012, 10000.0, null) :: Row(2013, 48000.0, 30000.0) :: Nil
    +    val df = trainingSales
    +      .groupBy($"sales.year")
    +      .pivot(struct(lower($"sales.course"), $"training"), Seq(
    +        struct(lit("dotnet"), lit("Experts")),
    +        struct(lit("java"), lit("Dummies")))
    +      ).agg(sum($"sales.earnings"))
    +
    +    checkAnswer(df, expected)
    +  }
    +
    +  test("pivoting column list") {
    +    val exception = intercept[RuntimeException] {
    +      trainingSales
    +        .groupBy($"sales.year")
    +        .pivot(struct(lower($"sales.course"), $"training"))
    +        .agg(sum($"sales.earnings"))
    +        .collect()
    --- End diff --
    
    I think invalid queries basically throw `AnalysisException. But, yea, indeed, we'd better to keep the current behaivour. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r219686833
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
             new RelationalGroupedDataset(
               df,
               groupingExprs,
    -          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply)))
    +          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr)))
    --- End diff --
    
    That's true in general but specifically is decimal precision more correct?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #95829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95829/testReport)** for PR 22316 at commit [`8ccf845`](https://github.com/apache/spark/commit/8ccf8458b577170e3a73d34dcac1074c30db0130).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r221427775
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -330,6 +330,15 @@ class RelationalGroupedDataset protected[sql](
        *   df.groupBy("year").pivot("course").sum("earnings")
        * }}}
        *
    +   * From Spark 2.5.0, values can be literal columns, for instance, struct. For pivoting by
    +   * multiple columns, use the `struct` function to combine the columns and values:
    +   *
    +   * {{{
    +   *   df.groupBy($"year")
    --- End diff --
    
    nit: `$"year"` -> `"year"`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214566083
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -406,6 +407,14 @@ class RelationalGroupedDataset protected[sql](
        *   df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings")
        * }}}
        *
    +   * For pivoting by multiple columns, use the `struct` function to combine the columns and values:
    --- End diff --
    
    Since the documentation states it's an overloaded version of ``` the `pivot` method with `pivotColumn` of the `String` type. ```, shall we move this contents to that method?
    
    Also, I would document this, for instance, 
    
    From Spark 2.4.0, values can be literal columns, for instance, `struct`. For pivoting by multiple columns, use the `struct` function to combine the columns and values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Branch is cut out. Let's target 3.0.0


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #96520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96520/testReport)** for PR 22316 at commit [`49b47fb`](https://github.com/apache/spark/commit/49b47fbbe3b78aa2ee3703d89af483052a02e33b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Yup I prefer this way


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by MaxGekk <gi...@git.apache.org>.

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    @gatorsmile Do you have any objections for this approach?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #95590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95590/testReport)** for PR 22316 at commit [`a097b29`](https://github.com/apache/spark/commit/a097b294854f99ec58ca307d85c19e54cd76d6b8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95590/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22316
  
    **[Test build #95829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95829/testReport)** for PR 22316 at commit [`8ccf845`](https://github.com/apache/spark/commit/8ccf8458b577170e3a73d34dcac1074c30db0130).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r219368724
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql](
             new RelationalGroupedDataset(
               df,
               groupingExprs,
    -          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(Literal.apply)))
    +          RelationalGroupedDataset.PivotType(pivotColumn.expr, values.map(lit(_).expr)))
    --- End diff --
    
    now we eventually call `Literal.create` instead of `Literal.apply`. I'm not sure if there is a behavior change though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org