You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arun (JIRA)" <ji...@apache.org> on 2015/06/25 15:01:04 UTC
[jira] [Created] (SPARK-8629) R code in SparkR
Arun created SPARK-8629:
---------------------------
Summary: R code in SparkR
Key: SPARK-8629
URL: https://issues.apache.org/jira/browse/SPARK-8629
Project: Spark
Issue Type: Question
Components: R
Reporter: Arun
Priority: Minor
Data set:
DC_City Dc_Code ItemNo Itemdescription dat Month Year SalesQuantity
Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 9/16/2012 9-Sep 2012 1
Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 12/21/2012 12-Dec2012 1
Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 1/12/2013 1-Jan 2013 1
Hyderabad 11 100005010 more. Value Chana Dal 1 Kg. 1/27/2013 1-Jan 2013 3
Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/1/2013 2-Feb 2013 2
Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/12/2013 2-Feb 2013 3
Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/13/2013 2-Feb 2013 2
Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/14/2013 2-Feb 2013 1
Hyderabad 11 100005011 more. Value Chana Dal 1 Kg. 2/15/2013 2-Feb 2013 8
Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/16/2013 2-Feb 2013 18
Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/17/2013 2-Feb 2013 19
Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/18/2013 2-Feb 2013 18
Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/19/2013 2-Feb 2013 18
Hyderabad 11 100005012 more. Value Chana Dal 1 Kg. 2/20/2013 2-Feb 2013 16
Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/21/2013 2-Feb 2013 25
Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/22/2013 2-Feb 2013 19
Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/23/2013 2-Feb 2013 17
Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/24/2013 2-Feb 2013 39
Hyderabad 11 100005013 more. Value Chana Dal 1 Kg. 2/25/2013 2-Feb 2013 23
Code i used in R:
data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE)
factors <- unique(data$ItemNo)
df.allitems <- data.frame()
for(i in 1:length(factors))
{
data1 <- filter(data, ItemNo == factors[[i]])
data2<- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns
date2$date <- as.Date(date2$date, format = "%m/%d/%y") # format the date
data3 <- data2[order(data2$date), ] # order by assending
df.allitems <- rbind(data3 , df.allitems) # Append by row bind
}
write.csv(df.allitems,"E:/all_items.csv")
-------------------------------------------------------------------------------
I have done some SparkR code:
data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R
df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF
factors <- distinct(df_1) # removed duplicates
#for select i used:
df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity") # select action
I dont know how to:
1) create a empty sparkR DF
2) Using for loop in SparkR
3) change the date format.
4) find the lenght() in spark df
5) using rbind in sparkR
can you help me out in doing the above code in sparkR.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org