You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ayush Goyal (Jira)" <ji...@apache.org> on 2020/10/05 09:59:00 UTC

[jira] [Created] (SPARK-33068) Spark 2.3 vs Spark 1.6 collect_list giving different schema

Ayush Goyal created SPARK-33068:
-----------------------------------

             Summary: Spark 2.3 vs Spark 1.6 collect_list giving different schema
                 Key: SPARK-33068
                 URL: https://issues.apache.org/jira/browse/SPARK-33068
             Project: Spark
          Issue Type: IT Help
          Components: Spark Submit
    Affects Versions: 2.3.4
            Reporter: Ayush Goyal


Hi,

I am migrating from spark 1.6 to spark 2.3. However in collect_list I am getting different schema.

 
{code:java}
val df_date_agg = df
    .groupBy($"a",$"b",$"c")
    .agg(sum($"d").alias("data1"),sum($"e").alias("data2"))
    .groupBy($"a")
    .agg(collect_list(array($"b",$"c",$"data1")).alias("final_data1"),
         collect_list(array($"b",$"c",$"data2")).alias("final_data2"))
{code}
When I am running above line in spark 1.6 getting below schema

 

 
{code:java}
 |-- final_data1: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- final_data2: array (nullable = true)
 |    |-- element: string (containsNull = true)
{code}
 

 

but in spark 2.3 schema changed to 

 
{code:java}
|-- final_data1: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: string (containsNull = true)
 |-- final_data1: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: string (containsNull = true)
{code}
 

 

In Spark 1.6 array($"b",$"c",$"data1") is converting to string like this 
{code:java}
'[2020-09-26, Ayush, 103.67]'
{code}
In spark 2.3 it is converted to WrappedArray
{code:java}
WrappedArray(2020-09-26, Ayush, 103.67)
{code}
I want to keep my schema as it is Otherwise all the dependent codes have to change.

 

Thanks

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org