You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kumaresh AK (Jira)" <ji...@apache.org> on 2021/04/07 22:23:00 UTC
[jira] [Updated] (SPARK-34982) Pyspark asDict() returns wrong child
field for nested dataframe
[ https://issues.apache.org/jira/browse/SPARK-34982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kumaresh AK updated SPARK-34982:
--------------------------------
Summary: Pyspark asDict() returns wrong child field for nested dataframe (was: Pyspark asDict() returns wrong fields for a nested dataframe)
> Pyspark asDict() returns wrong child field for nested dataframe
> ---------------------------------------------------------------
>
> Key: SPARK-34982
> URL: https://issues.apache.org/jira/browse/SPARK-34982
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.0.1, 3.0.2
> Environment: Tested with EMR 6.2.0. python: 3.8.5
> Also Tested with local pyspark on windows. v: 3.0.1. python: 3.8.5
> Reporter: Kumaresh AK
> Priority: Major
> Attachments: SPARK-34982.py
>
>
> Hello! I upgraded a job to Spark 3.0.1 (from 2.4.4) and encountered this issue. The job uses asDict(True) in pyspark. I reproduced the issue with a concise schema and code. Consider this example schema:
> {code:java}
> root
> |-- id: integer (nullable = false)
> |-- struct_1: struct (nullable = true)
> | |-- array_1_1: array (nullable = true)
> | | |-- element: string (containsNull = false)
> |-- struct_2: struct (nullable = true)
> | |-- array_2_1: array (nullable = true)
> | | |-- element: string (containsNull = false){code}
> I created 100 rows with the above schema filled it with some numbers and checked the row.asDict(True) against the input. For some rows
> {code:java}
> struct_1.array_1_1{code}
> is missing. Instead I get
> {code:java}
> struct_1.array_2_1{code}
> And I also observe this happens when array_1_1 is null. Example assert failure:
> {code:java}
> AssertionError: {'id': 7, 'struct_1': {'array_2_1': None}, 'struct_2': {'array_2_1': None}} != {'id': 7, 'struct_1': {'array_1_1': None}, 'struct_2': {'array_2_1': None}}
> {code}
> I have attached a minimal script that reproduces this issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org