You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Souder (JIRA)" <ji...@apache.org> on 2018/07/17 19:05:00 UTC
[jira] [Created] (SPARK-24835) col function ignores drop
Michael Souder created SPARK-24835:
--------------------------------------
Summary: col function ignores drop
Key: SPARK-24835
URL: https://issues.apache.org/jira/browse/SPARK-24835
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.3.0
Environment: Spark 2.3.0
Python 3.5.3
Reporter: Michael Souder
Not sure if this is a bug or user error, but I've noticed that accessing columns with the col function ignores a previous call to drop.
{code}
import pyspark.sql.functions as F
df = spark.createDataFrame([(1,3,5), (2, None, 7), (0, 3, 2)], ['a', 'b', 'c'])
df.show()
+---+----+---+
| a| b| c|
+---+----+---+
| 1| 3| 5|
| 2|null| 7|
| 0| 3| 2|
+---+----+---+
df = df.drop('c')
# the col function is able to see the 'c' column even though it has been dropped
df.where(F.col('c') < 6).show()
+---+---+
| a| b|
+---+---+
| 1| 3|
| 0| 3|
+---+---+
# trying the same with brackets on the data frame fails with the expected error
df.where(df['c'] < 6).show()
Py4JJavaError: An error occurred while calling o36909.apply.
: org.apache.spark.sql.AnalysisException: Cannot resolve column name "c" among (a, b);{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org