You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Devesh Raj Singh <ra...@gmail.com> on 2016/02/04 12:51:06 UTC

problem in creating function in sparkR for dummy handling

Hi,

I have written a code to create dummy variables in sparkR

df <- createDataFrame(sqlContext, iris)
class(dtypes(df))
cat.column<-vector(mode="character",length=nrow(df))
cat.column<-collect(select(df,df$Species))

lev<-length(levels(as.factor(unlist(cat.column))))
for (j in 1:lev){



dummy.df.new<-withColumn(df,paste0(colnames(cat.column),j),ifelse(df$Species==levels(as.factor(unlist(cat.column)))[j],1,0)
)

  df<-dummy.df.new

    }

*head(df): gives me the desired output:*

Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1 Species2
Species3
1          5.1         3.5          1.4         0.2  setosa        1
 0        0
2          4.9         3.0          1.4         0.2  setosa        1
 0        0
3          4.7         3.2          1.3         0.2  setosa        1
 0        0
4          4.6         3.1          1.5         0.2  setosa        1
 0        0
5          5.0         3.6          1.4         0.2  setosa        1
 0        0
6          5.4         3.9          1.7         0.4  setosa        1
 0        0



*But the same thing when I try to do by creating a function *

# x= dataframe$x, categorical column within the dataframe
# dataframe=sparkR dataframe

dummyhandle<-function(dataframe,x){

 cat.column<-vector(mode="character",length=nrow(dataframe))
    cat.column<-collect(select(dataframe,x))
    lev<-length(levels(as.factor(unlist(cat.column))))

    for (j in 1:lev){



dummy.df<-withColumn(dataframe,paste0(colnames(cat.column),j),ifelse(x==levels(as.factor(unlist(cat.column)))[j],1,0)
)

      dataframe<-dummy.df


    }
    return(dataframe)
}

*throws the following error:*

Error in withColumn(dataframe, paste0(colnames(cat.column), j), ifelse(x ==
 :
  error in evaluating the argument 'col' in selecting a method for function
'withColumn': Error in if (le > 0) paste0("[1:", paste(le), "]") else "(0)"
:
  argument is not interpretable as logical


-- 
Warm regards,
Devesh.