You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ayush Goyal (Jira)" <ji...@apache.org> on 2020/10/05 09:59:00 UTC
[jira] [Created] (SPARK-33068) Spark 2.3 vs Spark 1.6 collect_list
giving different schema
Ayush Goyal created SPARK-33068:
-----------------------------------
Summary: Spark 2.3 vs Spark 1.6 collect_list giving different schema
Key: SPARK-33068
URL: https://issues.apache.org/jira/browse/SPARK-33068
Project: Spark
Issue Type: IT Help
Components: Spark Submit
Affects Versions: 2.3.4
Reporter: Ayush Goyal
Hi,
I am migrating from spark 1.6 to spark 2.3. However in collect_list I am getting different schema.
{code:java}
val df_date_agg = df
.groupBy($"a",$"b",$"c")
.agg(sum($"d").alias("data1"),sum($"e").alias("data2"))
.groupBy($"a")
.agg(collect_list(array($"b",$"c",$"data1")).alias("final_data1"),
collect_list(array($"b",$"c",$"data2")).alias("final_data2"))
{code}
When I am running above line in spark 1.6 getting below schema
{code:java}
|-- final_data1: array (nullable = true)
| |-- element: string (containsNull = true)
|-- final_data2: array (nullable = true)
| |-- element: string (containsNull = true)
{code}
but in spark 2.3 schema changed to
{code:java}
|-- final_data1: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: string (containsNull = true)
|-- final_data1: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: string (containsNull = true)
{code}
In Spark 1.6 array($"b",$"c",$"data1") is converting to string like this
{code:java}
'[2020-09-26, Ayush, 103.67]'
{code}
In spark 2.3 it is converted to WrappedArray
{code:java}
WrappedArray(2020-09-26, Ayush, 103.67)
{code}
I want to keep my schema as it is Otherwise all the dependent codes have to change.
Thanks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org