You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arun (JIRA)" <ji...@apache.org> on 2015/06/25 15:01:04 UTC
[jira] [Created] (SPARK-8629) R code in SparkR

Arun created SPARK-8629:
---------------------------

             Summary: R code in SparkR
                 Key: SPARK-8629
                 URL: https://issues.apache.org/jira/browse/SPARK-8629
             Project: Spark
          Issue Type: Question
          Components: R
            Reporter: Arun
            Priority: Minor


Data set:  
  
DC_City  	Dc_Code	ItemNo  	Itemdescription	                dat   Month	Year	SalesQuantity 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	9/16/2012	9-Sep 2012	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	12/21/2012	12-Dec2012 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/12/2013	1-Jan	2013	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/27/2013	1-Jan	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/1/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/12/2013	2-Feb	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/13/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/14/2013	2-Feb	2013	 1 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/15/2013	2-Feb	2013	 8 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/16/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/17/2013	2-Feb	2013	 19 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/18/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/19/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/20/2013	2-Feb	2013	 16 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/21/2013	2-Feb	2013	 25 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/22/2013	2-Feb	2013	 19 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/23/2013	2-Feb	2013	 17 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/24/2013	2-Feb	2013	 39 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/25/2013	2-Feb	2013	 23 


Code i used in R:

  data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) 
  factors <- unique(data$ItemNo) 
  df.allitems <- data.frame() 
  for(i in 1:length(factors)) 
  { 
   data1 <- filter(data, ItemNo  == factors[[i]]) 
   data2<- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns	
   date2$date <- as.Date(date2$date, format = "%m/%d/%y") # format the date 
   data3 <- data2[order(data2$date), ] # order by assending 
   df.allitems <- rbind(data3 , df.allitems)  # Append by row bind 
  } 
  
  write.csv(df.allitems,"E:/all_items.csv") 

------------------------------------------------------------------------------- 
  
I have done some SparkR code: 
  data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R 
  df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors <- distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity") # select action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org