You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Stefan Panayotov <sp...@msn.com> on 2015/07/21 20:49:59 UTC

Add column to DF

Hi,
 
I am trying to ad a column to a data frame that I created based on a JSON file like this:
 
val input =
hiveCtx.jsonFile("wasb://nhi@cmwhdinsightdatastore.blob.core.windows.net/json/*").toDF().persist(StorageLevel.MEMORY_AND_DISK)
 
I have a function that is generating the values for the new column:
 
def determineDayPartID(evntStDate: String, evntStHour: String) : Int  = {


    val stFormat = new
java.text.SimpleDateFormat("yyMMdd")


    var stDateStr:String = evntStDate.substring(2,8)


    val stDate:Date = stFormat.parse(stDateStr)


    val stHour = evntStHour.substring(1,3).toDouble + 0.1


    var bucket = Math.ceil(stHour/3.0).toInt


    val cal:Calendar = Calendar.getInstance


    cal.setTime(stDate)


    var dayOfWeek = cal.get(Calendar.DAY_OF_WEEK)


    if (dayOfWeek == 1) dayOfWeek = 8


    if (dayOfWeek > 6) bucket = bucket + 8


    return bucket


  }


When I try:
 
input.withColumn("DayPartID", callUDF(determineDayPartID, IntegerType, col("StartDate"), col("EventStartHour")))
 
I am getting the error:
 
missing arguments for
method determineDayPartID in object rating; follow this method with `_' if you
want to treat it as a partially applied function

Can you please help?


Stefan Panayotov, PhD 
Home: 610-355-0919 
Cell: 610-517-5586 
email: spanayotov@msn.com 
spanayotov@outlook.com 
spanayotov@comcast.net
  		 	   		  

RE: Add column to DF

Posted by Stefan Panayotov <sp...@msn.com>.
This is working!
Thank you so much :)

Stefan Panayotov, PhD 
Home: 610-355-0919 
Cell: 610-517-5586 
email: spanayotov@msn.com 
spanayotov@outlook.com 
spanayotov@comcast.net

 
From: michael@databricks.com
Date: Tue, 21 Jul 2015 12:08:04 -0700
Subject: Re: Add column to DF
To: spanayotov@msn.com
CC: user@spark.apache.org

Try instead:
import org.apache.spark.sql.functions._
val determineDayPartID = udf((evntStDate: String, evntStHour: String) => {
    val stFormat = new java.text.SimpleDateFormat("yyMMdd")
    var stDateStr:String = evntStDate.substring(2,8)
    val stDate:Date = stFormat.parse(stDateStr)
    val stHour = evntStHour.substring(1,3).toDouble + 0.1
    var bucket = Math.ceil(stHour/3.0).toInt
    val cal:Calendar = Calendar.getInstance
    cal.setTime(stDate)
    var dayOfWeek = cal.get(Calendar.DAY_OF_WEEK)
    if (dayOfWeek == 1) dayOfWeek = 8
    if (dayOfWeek > 6) bucket = bucket + 8
   bucket

  })

input.withColumn("DayPartID", determineDayPartID (col("StartDate"), col("EventStartHour"))) 		 	   		  

Re: Add column to DF

Posted by Michael Armbrust <mi...@databricks.com>.
Try instead:

import org.apache.spark.sql.functions._

val determineDayPartID = udf((evntStDate: String, evntStHour: String) => {
    val stFormat = new java.text.SimpleDateFormat("yyMMdd")
    var stDateStr:String = evntStDate.substring(2,8)
    val stDate:Date = stFormat.parse(stDateStr)
    val stHour = evntStHour.substring(1,3).toDouble + 0.1
    var bucket = Math.ceil(stHour/3.0).toInt
    val cal:Calendar = Calendar.getInstance
    cal.setTime(stDate)
    var dayOfWeek = cal.get(Calendar.DAY_OF_WEEK)
    if (dayOfWeek == 1) dayOfWeek = 8
    if (dayOfWeek > 6) bucket = bucket + 8
   bucket

  })

input.withColumn("DayPartID", determineDayPartID (col("StartDate"),
col("EventStartHour")))