You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "koert kuipers (JIRA)" <ji...@apache.org> on 2016/04/15 23:23:25 UTC

[jira] [Closed] (SPARK-8817) DataFrame should not allow duplicate colum names

     [ https://issues.apache.org/jira/browse/SPARK-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

koert kuipers closed SPARK-8817.
--------------------------------
    Resolution: Not A Problem

I believe community disagrees with me and thinks its ok to have duplicate names, so i am going to close this 

> DataFrame should not allow duplicate colum names
> ------------------------------------------------
>
>                 Key: SPARK-8817
>                 URL: https://issues.apache.org/jira/browse/SPARK-8817
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: koert kuipers
>            Priority: Minor
>
> pull 2209 (https://github.com/apache/spark/pull/2209) for SPARK-2890 disabled field name validation (which checks for duplicate column names) in StructType, in favor of throwing throwing an error in SQL query analysis.
> the problem with this is that it is not intuitive for a DataFrame to have duplicate column names, and not all usage of DataFrame involves SQL queries.
> by removing the check from StructType and hence from DataFrame it becomes the responsibility of the DSLs that are build on top of DataFrame to do these checks, which is more burdensome and can lead to subtle errors. i ran into this while writing an alternative DSL for DataFrame.
> In R duplicate columns get automatically renamed:
> > data.frame(x = c(1,2), x = c(3,4))
>   x x.1
> 1 1   3
> 2 2   4
> i believe pandas does allow duplicate names, but i am not sure (never used it).
> maybe StructType.validateFields can do something similar to what R does and simply renames the dupes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org