You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Neil Dewar (JIRA)" <ji...@apache.org> on 2016/07/10 02:24:10 UTC
[jira] [Created] (SPARK-16464) withColumn() allows illegal creation
of duplicate column names on DataFrame
Neil Dewar created SPARK-16464:
----------------------------------
Summary: withColumn() allows illegal creation of duplicate column names on DataFrame
Key: SPARK-16464
URL: https://issues.apache.org/jira/browse/SPARK-16464
Project: Spark
Issue Type: Bug
Components: SparkR, SQL
Affects Versions: 1.6.1
Environment: Databricks.com
Reporter: Neil Dewar
Priority: Minor
If I take an existing DataFrame, I am permitted to use withColumn() to create a duplicate column name. I assume this should be illegal, and withColumn should be prevented from permitting this. Some functions subsequently fail due to the duplicate column names. Example:
sdfCar <- createDataFrame(sqlContext, mtcars)
sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20)
sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == sdfCar1$mpg,1,0))
sdfCar2 <- subset(sdfCar1, select=sdfCar1$isEfficient)
# subset() command fails with message: "Reference 'isEfficient' is ambiguous"
Note: I only know if this is SparkR - it might affect other languages APIs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org