You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by luca_guerra <lg...@bitbang.com> on 2016/05/18 13:42:18 UTC

Spark Task not serializable with lag Window function

I've noticed that after I use a Window function over a DataFrame if I call a
map() with a function, Spark returns a "Task not serializable" Exception
This is my code:

val hc = new org.apache.spark.sql.hive.HiveContext(sc)
import hc.implicits._
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
def f():String = "test"
case class P(name:String,surname:String)
val lag_result = lag($"name",1).over(Window.partitionBy($"surname"))
val lista = List(P("N1","S1"),P("N2","S2"),P("N2","S2"))
val data_frame = hc.createDataFrame(sc.parallelize(lista))
df.withColumn("lag_result", lag_result).map(x => f)
//df.withColumn("lag_result", lag_result).map{case x => def f():String =
"test";f}.collect // This works

And this is the Stack Trace:

org.apache.spark.SparkException: Task not serializable
    at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
    at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)
    at 
... and more
Caused by: java.io.NotSerializableException: org.apache.spark.sql.Column
Serialization stack:
    - object not serializable (class: org.apache.spark.sql.Column, value:
'lag(name,1,null) windowspecdefinition(surname,UnspecifiedFrame))
    - field (class:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC,
name: lag_result, type: class org.apache.spark.sql.Column)
... and more



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Task-not-serializable-with-lag-Window-function-tp26976.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org