You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Souder (Jira)" <ji...@apache.org> on 2020/03/25 22:57:00 UTC
[jira] [Created] (SPARK-31256) Dropna doesn't work for struct
columns
Michael Souder created SPARK-31256:
--------------------------------------
Summary: Dropna doesn't work for struct columns
Key: SPARK-31256
URL: https://issues.apache.org/jira/browse/SPARK-31256
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.4.5
Environment: Spark 2.4.5
Python 3.7.4
Reporter: Michael Souder
Dropna using a subset with a column from a struct drops the entire data frame.
{code:python}
import pyspark.sql.functions as F
df = spark.createDataFrame([(5, 80, 'Alice'), (10, None, 'Bob'), (15, 80, None)], schema=['age', 'height', 'name'])
df.show()
+---+------+-----+
|age|height| name|
+---+------+-----+
| 5| 80|Alice|
| 10| null| Bob|
| 15| 80| null|
+---+------+-----+
# this works just fine
df.dropna(subset=['name']).show()
+---+------+-----+
|age|height| name|
+---+------+-----+
| 5| 80|Alice|
| 10| null| Bob|
+---+------+-----+
# now add a struct column
df_with_struct = df.withColumn('struct_col', F.struct('age', 'height', 'name'))
df_with_struct.show(truncate=False)
+---+------+-----+--------------+
|age|height|name |struct_col |
+---+------+-----+--------------+
|5 |80 |Alice|[5, 80, Alice]|
|10 |null |Bob |[10,, Bob] |
|15 |80 |null |[15, 80,] |
+---+------+-----+--------------+
# now dropna drops the whole dataframe when you use struct_col
df_with_struct.dropna(subset=['struct_col.name']).show(truncate=False)
+---+------+----+----------+
|age|height|name|struct_col|
+---+------+----+----------+
+---+------+----+----------+
{code}
I've tested the above code in Spark 2.4.4 with python 3.7.4 and Spark 2.3.1 with python 3.6.8 and in both, the result looks like:
{code:python}
df_with_struct.dropna(subset=['struct_col.name']).show(truncate=False)
+---+------+-----+--------------+
|age|height|name |struct_col |
+---+------+-----+--------------+
|5 |80 |Alice|[5, 80, Alice]|
|10 |null |Bob |[10,, Bob] |
+---+------+-----+--------------+
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org