You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Benyi Wang <be...@gmail.com> on 2016/11/07 22:24:29 UTC

Does DeserializeToObject mean that a Row is deserialized to Java objects?

Below is my test code using Spark 2.0.1. DeserializeToObject doesn’t exist
in filter() but in map(). Does it means map() does not Tungsten operation?

case class Event(id: Long)
val e1 = Seq(Event(1L), Event(2L)).toDSval e2 = Seq(Event(2L), Event(3L)).toDS

e1.filter(e=>e.id < 10 && e.id > 5).explain
// == Physical Plan ==// *Filter <function1>.apply// +- LocalTableScan [id#145L]

e1.map(e=>e.id < 10 && e.id > 5).explain// == Physical Plan ==//
*SerializeFromObject [input[0, boolean, true] AS value#155]// +-
*MapElements <function1>, obj#154: boolean//    +-
*DeserializeToObject newInstance(class $line41.$read$$iw$$iw$Event),
obj#153: // $line41.$read$$iw$$iw$Event//       +- LocalTableScan
[id#145L]

Another question: If I register a complex function as a UDF, in what
situation, DeserializeToObject/SerialzeFromObject will happen?

Thanks.
​