You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Enrico Minack (Jira)" <ji...@apache.org> on 2022/06/03 15:53:00 UTC
[jira] [Commented] (SPARK-39292) Make Dataset.melt work with struct fields

    [ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17546058#comment-17546058 ] 

Enrico Minack commented on SPARK-39292:
---------------------------------------

This is being fixed as part of https://issues.apache.org/jira/browse/SPARK-39292

> Make Dataset.melt work with struct fields
> -----------------------------------------
>
>                 Key: SPARK-39292
>                 URL: https://issues.apache.org/jira/browse/SPARK-39292
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Enrico Minack
>            Priority: Major
>
> In SPARK-38864, the melt function was added to Dataset.
> It would be nice if fields of struct fields could be used as id and value columns. This would allow for the following:
> Given a Dataset with following schema:
> {code:java}
> root
>  |-- an: struct (nullable = false)
>  |    |-- id: integer (nullable = false)
>  |-- str: struct (nullable = false)
>  |    |-- one: string (nullable = true)
>  |    |-- two: string (nullable = true)
> {code}
> For example:
> {code:java}
> +---+-------------+
> | an|          str|
> +---+-------------+
> |{1}|   {one, One}|
> |{2}|  {two, null}|
> |{3}|{null, three}|
> |{4}| {null, null}|
> +---+-------------+
> {code}
> Melting with value columns {{Seq("str.one", "str.two")}} on id columns {{Seq("an.id")}} would result in
> {code:java}
> +--+--------+-----+
> |an|variable|value|
> +--+--------+-----+
> | 1| str.one|  one|
> | 1| str.two|  One|
> | 2| str.one|  two|
> | 2| str.two| null|
> | 3| str.one| null|
> | 3| str.two|three|
> | 4| str.one| null|
> | 4| str.two| null|
> +--+--------+-----+
> {code}
> See test in {{org.apache.spark.sql.MeltSuite}}:
> {code:java}
>   test("SPARK-39292: melt with struct fields") {
>     val df = meltWideDataDs.select(
>       struct($"id").as("an"),
>       struct(
>         $"str1".as("one"),
>         $"str2".as("two")
>       ).as("str")
>     )
>     checkAnswer(
>       Melt.of(df, Seq("an.id"), Seq("str.one", "str.two"), false, "variable", "value"),
>       meltedWideDataRows.map(row => Row(
>         row.getInt(0),
>         row.getString(1) match {
>           case "str1" => "str.one"
>           case "str2" => "str.two"
>         },
>         row.getString(2)
>       ))
>     )
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org