You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Stefan Panayotov <sp...@msn.com> on 2015/07/21 20:49:59 UTC
Add column to DF
Hi,
I am trying to ad a column to a data frame that I created based on a JSON file like this:
val input =
hiveCtx.jsonFile("wasb://nhi@cmwhdinsightdatastore.blob.core.windows.net/json/*").toDF().persist(StorageLevel.MEMORY_AND_DISK)
I have a function that is generating the values for the new column:
def determineDayPartID(evntStDate: String, evntStHour: String) : Int = {
val stFormat = new
java.text.SimpleDateFormat("yyMMdd")
var stDateStr:String = evntStDate.substring(2,8)
val stDate:Date = stFormat.parse(stDateStr)
val stHour = evntStHour.substring(1,3).toDouble + 0.1
var bucket = Math.ceil(stHour/3.0).toInt
val cal:Calendar = Calendar.getInstance
cal.setTime(stDate)
var dayOfWeek = cal.get(Calendar.DAY_OF_WEEK)
if (dayOfWeek == 1) dayOfWeek = 8
if (dayOfWeek > 6) bucket = bucket + 8
return bucket
}
When I try:
input.withColumn("DayPartID", callUDF(determineDayPartID, IntegerType, col("StartDate"), col("EventStartHour")))
I am getting the error:
missing arguments for
method determineDayPartID in object rating; follow this method with `_' if you
want to treat it as a partially applied function
Can you please help?
Stefan Panayotov, PhD
Home: 610-355-0919
Cell: 610-517-5586
email: spanayotov@msn.com
spanayotov@outlook.com
spanayotov@comcast.net
RE: Add column to DF
Posted by Stefan Panayotov <sp...@msn.com>.
This is working!
Thank you so much :)
Stefan Panayotov, PhD
Home: 610-355-0919
Cell: 610-517-5586
email: spanayotov@msn.com
spanayotov@outlook.com
spanayotov@comcast.net
From: michael@databricks.com
Date: Tue, 21 Jul 2015 12:08:04 -0700
Subject: Re: Add column to DF
To: spanayotov@msn.com
CC: user@spark.apache.org
Try instead:
import org.apache.spark.sql.functions._
val determineDayPartID = udf((evntStDate: String, evntStHour: String) => {
val stFormat = new java.text.SimpleDateFormat("yyMMdd")
var stDateStr:String = evntStDate.substring(2,8)
val stDate:Date = stFormat.parse(stDateStr)
val stHour = evntStHour.substring(1,3).toDouble + 0.1
var bucket = Math.ceil(stHour/3.0).toInt
val cal:Calendar = Calendar.getInstance
cal.setTime(stDate)
var dayOfWeek = cal.get(Calendar.DAY_OF_WEEK)
if (dayOfWeek == 1) dayOfWeek = 8
if (dayOfWeek > 6) bucket = bucket + 8
bucket
})
input.withColumn("DayPartID", determineDayPartID (col("StartDate"), col("EventStartHour")))
Re: Add column to DF
Posted by Michael Armbrust <mi...@databricks.com>.
Try instead:
import org.apache.spark.sql.functions._
val determineDayPartID = udf((evntStDate: String, evntStHour: String) => {
val stFormat = new java.text.SimpleDateFormat("yyMMdd")
var stDateStr:String = evntStDate.substring(2,8)
val stDate:Date = stFormat.parse(stDateStr)
val stHour = evntStHour.substring(1,3).toDouble + 0.1
var bucket = Math.ceil(stHour/3.0).toInt
val cal:Calendar = Calendar.getInstance
cal.setTime(stDate)
var dayOfWeek = cal.get(Calendar.DAY_OF_WEEK)
if (dayOfWeek == 1) dayOfWeek = 8
if (dayOfWeek > 6) bucket = bucket + 8
bucket
})
input.withColumn("DayPartID", determineDayPartID (col("StartDate"),
col("EventStartHour")))