You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/10/07 08:03:00 UTC
[jira] [Commented] (SPARK-33068) Spark 2.3 vs Spark 1.6
collect_list giving different schema
[ https://issues.apache.org/jira/browse/SPARK-33068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209367#comment-17209367 ]
Hyukjin Kwon commented on SPARK-33068:
--------------------------------------
We don't keep the compaitibility in output column names. If you want it to be compatible, you should explicitly rename.
> Spark 2.3 vs Spark 1.6 collect_list giving different schema
> -----------------------------------------------------------
>
> Key: SPARK-33068
> URL: https://issues.apache.org/jira/browse/SPARK-33068
> Project: Spark
> Issue Type: IT Help
> Components: Spark Submit
> Affects Versions: 2.3.4
> Reporter: Ayush Goyal
> Priority: Major
>
> Hi,
> I am migrating from spark 1.6 to spark 2.3. However in collect_list I am getting different schema.
>
> {code:java}
> val df_date_agg = df
> .groupBy($"a",$"b",$"c")
> .agg(sum($"d").alias("data1"),sum($"e").alias("data2"))
> .groupBy($"a")
> .agg(collect_list(array($"b",$"c",$"data1")).alias("final_data1"),
> collect_list(array($"b",$"c",$"data2")).alias("final_data2"))
> {code}
> When I am running above line in spark 1.6 getting below schema
>
>
> {code:java}
> |-- final_data1: array (nullable = true)
> | |-- element: string (containsNull = true)
> |-- final_data2: array (nullable = true)
> | |-- element: string (containsNull = true)
> {code}
>
>
> but in spark 2.3 schema changed to
>
> {code:java}
> |-- final_data1: array (nullable = true)
> | |-- element: array (containsNull = true)
> | | |-- element: string (containsNull = true)
> |-- final_data1: array (nullable = true)
> | |-- element: array (containsNull = true)
> | | |-- element: string (containsNull = true)
> {code}
>
>
> In Spark 1.6 array($"b",$"c",$"data1") is converting to string like this
> {code:java}
> '[2020-09-26, Ayush, 103.67]'
> {code}
> In spark 2.3 it is converted to WrappedArray
> {code:java}
> WrappedArray(2020-09-26, Ayush, 103.67)
> {code}
> I want to keep my schema as it is Otherwise all the dependent codes have to change.
>
> Thanks
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org