You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/10/07 08:03:00 UTC

[jira] [Commented] (SPARK-33068) Spark 2.3 vs Spark 1.6 collect_list giving different schema

    [ https://issues.apache.org/jira/browse/SPARK-33068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209367#comment-17209367 ] 

Hyukjin Kwon commented on SPARK-33068:
--------------------------------------

We don't keep the compaitibility in output column names. If you want it to be compatible, you should explicitly rename.

> Spark 2.3 vs Spark 1.6 collect_list giving different schema
> -----------------------------------------------------------
>
>                 Key: SPARK-33068
>                 URL: https://issues.apache.org/jira/browse/SPARK-33068
>             Project: Spark
>          Issue Type: IT Help
>          Components: Spark Submit
>    Affects Versions: 2.3.4
>            Reporter: Ayush Goyal
>            Priority: Major
>
> Hi,
> I am migrating from spark 1.6 to spark 2.3. However in collect_list I am getting different schema.
>  
> {code:java}
> val df_date_agg = df
>     .groupBy($"a",$"b",$"c")
>     .agg(sum($"d").alias("data1"),sum($"e").alias("data2"))
>     .groupBy($"a")
>     .agg(collect_list(array($"b",$"c",$"data1")).alias("final_data1"),
>          collect_list(array($"b",$"c",$"data2")).alias("final_data2"))
> {code}
> When I am running above line in spark 1.6 getting below schema
>  
>  
> {code:java}
>  |-- final_data1: array (nullable = true)
>  |    |-- element: string (containsNull = true)
>  |-- final_data2: array (nullable = true)
>  |    |-- element: string (containsNull = true)
> {code}
>  
>  
> but in spark 2.3 schema changed to 
>  
> {code:java}
> |-- final_data1: array (nullable = true)
>  |    |-- element: array (containsNull = true)
>  |    |    |-- element: string (containsNull = true)
>  |-- final_data1: array (nullable = true)
>  |    |-- element: array (containsNull = true)
>  |    |    |-- element: string (containsNull = true)
> {code}
>  
>  
> In Spark 1.6 array($"b",$"c",$"data1") is converting to string like this 
> {code:java}
> '[2020-09-26, Ayush, 103.67]'
> {code}
> In spark 2.3 it is converted to WrappedArray
> {code:java}
> WrappedArray(2020-09-26, Ayush, 103.67)
> {code}
> I want to keep my schema as it is Otherwise all the dependent codes have to change.
>  
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org