You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Felix Cheung (JIRA)" <ji...@apache.org> on 2015/08/29 03:00:48 UTC

[jira] [Created] (SPARK-10346) SparkR mutate and transform should replace column with same name to match R data.frame behavior

Felix Cheung created SPARK-10346:
------------------------------------

             Summary: SparkR mutate and transform should replace column with same name to match R data.frame behavior
                 Key: SPARK-10346
                 URL: https://issues.apache.org/jira/browse/SPARK-10346
             Project: Spark
          Issue Type: Bug
          Components: R
    Affects Versions: 1.5.0
            Reporter: Felix Cheung


Spark doesn't seem to replace existing column with the name in mutate (ie. mutate(df, age = df$age + 2) - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform.

Though it is clearly stated it should replace column with matching name:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/transform.html

"The tags are matched against names(_data), and for those that match, the value replace the corresponding variable in _data, and the others are appended to _data."

Also the resulting DataFrame might be hard to work with if one is to use select with column names and so on.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org