You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Tomasz Bartczak (JIRA)" <ji...@apache.org> on 2016/04/20 17:43:25 UTC

[jira] [Created] (SPARK-14759) After join one cannot drop dynamically added column

Tomasz Bartczak created SPARK-14759:
---------------------------------------

             Summary: After join one cannot drop dynamically added column
                 Key: SPARK-14759
                 URL: https://issues.apache.org/jira/browse/SPARK-14759
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.6.1
            Reporter: Tomasz Bartczak
            Priority: Minor


running following code:
{code}
from pyspark.sql.functions import *
df1 = sqlContext.createDataFrame([(1,10,)], ['any','hour'])
df2 = sqlContext.createDataFrame([(1,)], ['any']).withColumn('hour',lit(10))

j = df1.join(df2,[df1.hour == df2.hour],how='left')
print("columns after join:{0}".format(j.columns))
jj = j.drop(df2.hour)
print("columns after removing 'hour':{0}".format(jj.columns))
{code}

should show that after join and remove df2.hour I end up with only one 'hour' column in dataframe.
Unfortunately this column is not dropped.
{code}
columns after join:            ['any', 'hour', 'any', 'hour']
columns after removing 'hour': ['any', 'hour', 'any', 'hour']
{code}

I found out that it behaves like that only when the column is added dynamically before the join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org