You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Shivaram Venkataraman (JIRA)" <ji...@apache.org> on 2016/04/28 18:40:13 UTC

[jira] [Resolved] (SPARK-10346) SparkR mutate and transform should replace column with same name to match R data.frame behavior

     [ https://issues.apache.org/jira/browse/SPARK-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shivaram Venkataraman resolved SPARK-10346.
-------------------------------------------
    Resolution: Fixed

> SparkR mutate and transform should replace column with same name to match R data.frame behavior
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10346
>                 URL: https://issues.apache.org/jira/browse/SPARK-10346
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 1.5.0
>            Reporter: Felix Cheung
>
> Spark doesn't seem to replace existing column with the name in mutate (ie. mutate(df, age = df$age + 2) - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform.
> Though it is clearly stated it should replace column with matching name:
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/transform.html
> "The tags are matched against names(_data), and for those that match, the value replace the corresponding variable in _data, and the others are appended to _data."
> Also the resulting DataFrame might be hard to work with if one is to use select with column names, or to register the table to SQL, and so on, since then 2 columns have the same name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org