You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arun (JIRA)" <ji...@apache.org> on 2015/06/25 15:06:05 UTC

[jira] [Updated] (SPARK-8629) R code in SparkR

     [ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun updated SPARK-8629:
------------------------
    Description: 
Data set:  
  
DC_City  	Dc_Code	ItemNo  	Itemdescription	                dat   Month	Year	SalesQuantity 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	9/16/2012	9-Sep 2012	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	12/21/2012	12-Dec2012 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/12/2013	1-Jan	2013	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/27/2013	1-Jan	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/1/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/12/2013	2-Feb	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/13/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/14/2013	2-Feb	2013	 1 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/15/2013	2-Feb	2013	 8 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/16/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/17/2013	2-Feb	2013	 19 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/18/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/19/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/20/2013	2-Feb	2013	 16 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/21/2013	2-Feb	2013	 25 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/22/2013	2-Feb	2013	 19 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/23/2013	2-Feb	2013	 17 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/24/2013	2-Feb	2013	 39 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/25/2013	2-Feb	2013	 23 


Code i used in R:

  data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) 
  factors <- unique(data$ItemNo) 
  df.allitems <- data.frame() 
  for(i in 1:length(factors)) 
  { 
   data1 <- filter(data, ItemNo  == factors[[i]]) 
   
  data2<- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns	
   date2$date <- as.Date(date2$date, format = "%m/%d/%y") # format the date 
   
   data3 <- data2[order(data2$date), ] # order by assending 
   df.allitems <- rbind(data3 , df.allitems)  # Append by row bind 
  } 
  
  write.csv(df.allitems,"E:/all_items.csv") 

------------------------------------------------------------------------------- 
  
I have done some SparkR code: 
  data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R 
  df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors <- distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity") # select action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.


  was:
Data set:  
  
DC_City  	Dc_Code	ItemNo  	Itemdescription	                dat   Month	Year	SalesQuantity 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	9/16/2012	9-Sep 2012	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	12/21/2012	12-Dec2012 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/12/2013	1-Jan	2013	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/27/2013	1-Jan	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/1/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/12/2013	2-Feb	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/13/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/14/2013	2-Feb	2013	 1 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/15/2013	2-Feb	2013	 8 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/16/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/17/2013	2-Feb	2013	 19 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/18/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/19/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/20/2013	2-Feb	2013	 16 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/21/2013	2-Feb	2013	 25 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/22/2013	2-Feb	2013	 19 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/23/2013	2-Feb	2013	 17 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/24/2013	2-Feb	2013	 39 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/25/2013	2-Feb	2013	 23 


Code i used in R:

  data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) 
  factors <- unique(data$ItemNo) 
  df.allitems <- data.frame() 
  for(i in 1:length(factors)) 
  { 
   data1 <- filter(data, ItemNo  == factors[[i]]) 
   data2<- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns	
   date2$date <- as.Date(date2$date, format = "%m/%d/%y") # format the date 
   data3 <- data2[order(data2$date), ] # order by assending 
   df.allitems <- rbind(data3 , df.allitems)  # Append by row bind 
  } 
  
  write.csv(df.allitems,"E:/all_items.csv") 

------------------------------------------------------------------------------- 
  
I have done some SparkR code: 
  data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R 
  df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors <- distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity") # select action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.



> R code in SparkR
> ----------------
>
>                 Key: SPARK-8629
>                 URL: https://issues.apache.org/jira/browse/SPARK-8629
>             Project: Spark
>          Issue Type: Question
>          Components: R
>            Reporter: Arun
>            Priority: Minor
>
> Data set:  
>   
> DC_City  	Dc_Code	ItemNo  	Itemdescription	                dat   Month	Year	SalesQuantity 
> Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	9/16/2012	9-Sep 2012	 1 
> Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	12/21/2012	12-Dec2012 1 
> Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/12/2013	1-Jan	2013	 1 
> Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/27/2013	1-Jan	2013	 3 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/1/2013	2-Feb	2013	 2 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/12/2013	2-Feb	2013	 3 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/13/2013	2-Feb	2013	 2 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/14/2013	2-Feb	2013	 1 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/15/2013	2-Feb	2013	 8 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/16/2013	2-Feb	2013	 18 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/17/2013	2-Feb	2013	 19 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/18/2013	2-Feb	2013	 18 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/19/2013	2-Feb	2013	 18 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/20/2013	2-Feb	2013	 16 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/21/2013	2-Feb	2013	 25 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/22/2013	2-Feb	2013	 19 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/23/2013	2-Feb	2013	 17 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/24/2013	2-Feb	2013	 39 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/25/2013	2-Feb	2013	 23 
> Code i used in R:
>   data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) 
>   factors <- unique(data$ItemNo) 
>   df.allitems <- data.frame() 
>   for(i in 1:length(factors)) 
>   { 
>    data1 <- filter(data, ItemNo  == factors[[i]]) 
>    
>   data2<- select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) # select particular columns	
>    date2$date <- as.Date(date2$date, format = "%m/%d/%y") # format the date 
>    
>    data3 <- data2[order(data2$date), ] # order by assending 
>    df.allitems <- rbind(data3 , df.allitems)  # Append by row bind 
>   } 
>   
>   write.csv(df.allitems,"E:/all_items.csv") 
> ------------------------------------------------------------------------------- 
>   
> I have done some SparkR code: 
>   data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R 
>   df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
>   factors <- distinct(df_1) # removed duplicates 
>   
> #for select i used: 
>   df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity") # select action 
> I dont know how to: 
>   1) create a empty sparkR DF 
>   2) Using for loop in SparkR 
>   3) change the date format. 
>   4) find the lenght() in spark df 
>   5) using rbind in sparkR 
>   
> can you help me out in doing the above code in sparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org