You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Devesh Raj Singh <ra...@gmail.com> on 2016/02/04 12:51:06 UTC
problem in creating function in sparkR for dummy handling
Hi,
I have written a code to create dummy variables in sparkR
df <- createDataFrame(sqlContext, iris)
class(dtypes(df))
cat.column<-vector(mode="character",length=nrow(df))
cat.column<-collect(select(df,df$Species))
lev<-length(levels(as.factor(unlist(cat.column))))
for (j in 1:lev){
dummy.df.new<-withColumn(df,paste0(colnames(cat.column),j),ifelse(df$Species==levels(as.factor(unlist(cat.column)))[j],1,0)
)
df<-dummy.df.new
}
*head(df): gives me the desired output:*
Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1 Species2
Species3
1 5.1 3.5 1.4 0.2 setosa 1
0 0
2 4.9 3.0 1.4 0.2 setosa 1
0 0
3 4.7 3.2 1.3 0.2 setosa 1
0 0
4 4.6 3.1 1.5 0.2 setosa 1
0 0
5 5.0 3.6 1.4 0.2 setosa 1
0 0
6 5.4 3.9 1.7 0.4 setosa 1
0 0
*But the same thing when I try to do by creating a function *
# x= dataframe$x, categorical column within the dataframe
# dataframe=sparkR dataframe
dummyhandle<-function(dataframe,x){
cat.column<-vector(mode="character",length=nrow(dataframe))
cat.column<-collect(select(dataframe,x))
lev<-length(levels(as.factor(unlist(cat.column))))
for (j in 1:lev){
dummy.df<-withColumn(dataframe,paste0(colnames(cat.column),j),ifelse(x==levels(as.factor(unlist(cat.column)))[j],1,0)
)
dataframe<-dummy.df
}
return(dataframe)
}
*throws the following error:*
Error in withColumn(dataframe, paste0(colnames(cat.column), j), ifelse(x ==
:
error in evaluating the argument 'col' in selecting a method for function
'withColumn': Error in if (le > 0) paste0("[1:", paste(le), "]") else "(0)"
:
argument is not interpretable as logical
--
Warm regards,
Devesh.