You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tomasz Bartczak (JIRA)" <ji...@apache.org> on 2016/04/20 17:43:25 UTC
[jira] [Created] (SPARK-14759) After join one cannot drop
dynamically added column
Tomasz Bartczak created SPARK-14759:
---------------------------------------
Summary: After join one cannot drop dynamically added column
Key: SPARK-14759
URL: https://issues.apache.org/jira/browse/SPARK-14759
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.6.1
Reporter: Tomasz Bartczak
Priority: Minor
running following code:
{code}
from pyspark.sql.functions import *
df1 = sqlContext.createDataFrame([(1,10,)], ['any','hour'])
df2 = sqlContext.createDataFrame([(1,)], ['any']).withColumn('hour',lit(10))
j = df1.join(df2,[df1.hour == df2.hour],how='left')
print("columns after join:{0}".format(j.columns))
jj = j.drop(df2.hour)
print("columns after removing 'hour':{0}".format(jj.columns))
{code}
should show that after join and remove df2.hour I end up with only one 'hour' column in dataframe.
Unfortunately this column is not dropped.
{code}
columns after join: ['any', 'hour', 'any', 'hour']
columns after removing 'hour': ['any', 'hour', 'any', 'hour']
{code}
I found out that it behaves like that only when the column is added dynamically before the join.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org