You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vicente Masip (JIRA)" <ji...@apache.org> on 2017/01/13 13:19:26 UTC

[jira] [Comment Edited] (SPARK-19177) SparkR Data Frame operation between columns elements

    [ https://issues.apache.org/jira/browse/SPARK-19177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821758#comment-15821758 ] 

Vicente Masip edited comment on SPARK-19177 at 1/13/17 1:19 PM:
----------------------------------------------------------------

If I want to specify schema with gapply or I NEED to specify it at dapply, I have had a problem.  The documentation example is beautiful: 

schema <- structType(structField("eruptions", "double"), structField("waiting", "double"),
                     structField("waiting_secs", "double"))
df1 <- dapply(df, function( x ) { x <- cbind(x, x$waiting * 60) }, schema)

your returning data.frame inside function is 3 columns size. I have 50 columns, and I want to return them all again a new computed column. 

Imagine that:  function( x ){ x <- cbind(x, x$waiting * 60) , in some way, x has many columns, and the new column has to be handled with an schema at the outside function dapply. How would yo define schema? You cannot append an structField to the structType.

Finally I'm going to solve it with a dummy new column specified with a lit, getting it's new schema and deleting the new column. Not elegant, but I keep on my work.


was (Author: masip85):
If I want to specify schema with gapply or I NEED to specify it at dapply, I have had a problem.  The documentation example is beautiful: 

schema <- structType(structField("eruptions", "double"), structField("waiting", "double"),
                     structField("waiting_secs", "double"))
df1 <- dapply(df, function( x ) { x <- cbind(x, x$waiting * 60) }, schema)

your returning data.frame inside function is 3 columns size. I have 50 columns, and I want to return them all again a new computed column. 

Imagine that:  function( x ) { x <- cbind(x, x$waiting * 60) , in some way, x has many columns, and the new column has to be handled with an schema at the outside function dapply. How would yo define schema? You cannot append an structField to the structType.

Finally I'm going to solve it with a dummy new column specified with a lit, getting it's new schema and deleting the new column. Not elegant, but I keep on my work.

> SparkR Data Frame operation between columns elements
> ----------------------------------------------------
>
>                 Key: SPARK-19177
>                 URL: https://issues.apache.org/jira/browse/SPARK-19177
>             Project: Spark
>          Issue Type: Question
>          Components: SparkR
>    Affects Versions: 2.0.2
>            Reporter: Vicente Masip
>            Priority: Minor
>              Labels: schema, sparkR, struct
>
> I have commented this in other thread, but I think it can be important to clarify that:
> What happen when you are working with 50 columns and gapply? Do I rewrite 50 columns scheme with it's new column from gapply operation? I think there is no alternative because structFields cannot be appended to structType. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org