You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "MvR (Jira)" <ji...@apache.org> on 2020/12/15 11:51:00 UTC

[jira] [Created] (ARROW-10916) gapply fails executing with rbind error

MvR created ARROW-10916:
---------------------------

             Summary: gapply fails executing with rbind error
                 Key: ARROW-10916
                 URL: https://issues.apache.org/jira/browse/ARROW-10916
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 2.0.0
         Environment: Databricks runtime 7.3 LTS ML
            Reporter: MvR
         Attachments: Rerror.log

Executing following code on databricks runtime 7.3 LTS ML errors out showing some rbind error whereas it is successfully executed without enabling Arrow in Spark session. Full error message attached.

 

```

library(dplyr)
library(SparkR)

SparkR::sparkR.session(sparkConfig = list(spark.sql.execution.arrow.sparkr.enabled = "true"))

mtcars %>%
 SparkR::as.DataFrame() %>%

SparkR::gapply(x = .,
 cols = c("cyl", "vs"),
 
 func = function(key,
 data){
 
 dt <- data[,c("mpg", "qsec")]
 res <- apply(dt, 2, mean)
 df <- data.frame(firstGroupKey = key[1],
 secondGroupKey = key[2],
 mean_mpg = res[1],
 mean_cyl = res[2])
 return(df)
 
 }, 
 schema = structType(structField("cyl", "double"),
 structField("vs", "double"),
 structField("mpg_mean", "double"),
 structField("qsec_mean", "double"))
 ) %>%
 display()

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)