You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kumaresh AK (Jira)" <ji...@apache.org> on 2021/04/07 22:23:00 UTC

[jira] [Updated] (SPARK-34982) Pyspark asDict() returns wrong child field for nested dataframe

     [ https://issues.apache.org/jira/browse/SPARK-34982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kumaresh AK updated SPARK-34982:
--------------------------------
    Summary: Pyspark asDict() returns wrong child field for nested dataframe  (was: Pyspark asDict() returns wrong fields for a nested dataframe)

> Pyspark asDict() returns wrong child field for nested dataframe
> ---------------------------------------------------------------
>
>                 Key: SPARK-34982
>                 URL: https://issues.apache.org/jira/browse/SPARK-34982
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.1, 3.0.2
>         Environment: Tested with EMR 6.2.0. python: 3.8.5
> Also Tested with local pyspark on windows. v: 3.0.1. python: 3.8.5
>            Reporter: Kumaresh AK
>            Priority: Major
>         Attachments: SPARK-34982.py
>
>
> Hello! I upgraded a job to Spark 3.0.1 (from 2.4.4) and encountered this issue. The job uses asDict(True) in pyspark. I reproduced the issue with a concise schema and code. Consider this example schema:
> {code:java}
> root
>  |-- id: integer (nullable = false)
>  |-- struct_1: struct (nullable = true)
>  | |-- array_1_1: array (nullable = true)
>  | | |-- element: string (containsNull = false)
>  |-- struct_2: struct (nullable = true)
>  | |-- array_2_1: array (nullable = true)
>  | | |-- element: string (containsNull = false){code}
> I created 100 rows with the above schema filled it with some numbers and checked the row.asDict(True) against the input. For some rows
> {code:java}
> struct_1.array_1_1{code}
> is missing. Instead I get
> {code:java}
> struct_1.array_2_1{code}
> And I also observe this happens when array_1_1 is null. Example assert failure:
> {code:java}
> AssertionError: {'id': 7, 'struct_1': {'array_2_1': None}, 'struct_2': {'array_2_1': None}} != {'id': 7, 'struct_1': {'array_1_1': None}, 'struct_2': {'array_2_1': None}}
> {code}
>  I have attached a minimal script that reproduces this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org