You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kumaresh AK (Jira)" <ji...@apache.org> on 2021/04/07 22:14:00 UTC

[jira] [Created] (SPARK-34982) Pyspark asDict() returns wrong fields for a nested dataframe

Kumaresh AK created SPARK-34982:
-----------------------------------

             Summary: Pyspark asDict() returns wrong fields for a nested dataframe
                 Key: SPARK-34982
                 URL: https://issues.apache.org/jira/browse/SPARK-34982
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.0.2, 3.0.1
         Environment: Tested with EMR 6.2.0. python: 3.8.5

Also Tested with local pyspark on windows. v: 3.0.1. python: 3.8.5
            Reporter: Kumaresh AK


Hello! I upgraded a job to Spark 3.0.1 (from 2.4.4) and encountered this issue. The job uses asDict(True) in pyspark. I reproduced the issue with a concise schema and code. Consider this example schema:
{code:java}
root
 |-- id: integer (nullable = false)
 |-- struct_1: struct (nullable = true)
 | |-- array_1_1: array (nullable = true)
 | | |-- element: string (containsNull = false)
 |-- struct_2: struct (nullable = true)
 | |-- array_2_1: array (nullable = true)
 | | |-- element: string (containsNull = false){code}
I created 100 rows with the above schema filled it with some numbers and checked the row.asDict(True) against the input. For some rows
{code:java}
struct_1.array_1_1{code}
is missing. Instead I get
{code:java}
struct_1.array_2_1{code}
And I also observe this happens when array_1_1 is null. Example assert failure:
{code:java}
AssertionError: {'id': 7, 'struct_1': {'array_2_1': None}, 'struct_2': {'array_2_1': None}} != {'id': 7, 'struct_1': {'array_1_1': None}, 'struct_2': {'array_2_1': None}}

{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org