You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Michael Souder (JIRA)" <ji...@apache.org> on 2018/07/17 19:05:00 UTC

[jira] [Created] (SPARK-24835) col function ignores drop

Michael Souder created SPARK-24835:
--------------------------------------

             Summary: col function ignores drop
                 Key: SPARK-24835
                 URL: https://issues.apache.org/jira/browse/SPARK-24835
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.3.0
         Environment: Spark 2.3.0

Python 3.5.3
            Reporter: Michael Souder


Not sure if this is a bug or user error, but I've noticed that accessing columns with the col function ignores a previous call to drop.
{code}
import pyspark.sql.functions as F

df = spark.createDataFrame([(1,3,5), (2, None, 7), (0, 3, 2)], ['a', 'b', 'c'])
df.show()

+---+----+---+
|  a|   b|  c|
+---+----+---+
|  1|   3|  5|
|  2|null|  7|
|  0|   3|  2|
+---+----+---+

df = df.drop('c')

# the col function is able to see the 'c' column even though it has been dropped
df.where(F.col('c') < 6).show()

+---+---+
|  a|  b|
+---+---+
|  1|  3|
|  0|  3|
+---+---+

# trying the same with brackets on the data frame fails with the expected error
df.where(df['c'] < 6).show()

Py4JJavaError: An error occurred while calling o36909.apply.
: org.apache.spark.sql.AnalysisException: Cannot resolve column name "c" among (a, b);{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org