You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Hamel Kothari <ha...@gmail.com> on 2016/04/28 22:57:22 UTC

ConvertToSafe being done before functions.explode

Hi all,

I've been looking at some of my query plans and noticed that pretty much
every explode that I run (which is always over a column with ArrayData) is
prefixed with a ConvertToSafe call in the physical plan. Looking at
Generate.scala it looks like it doesn't override canProcessUnsafeRows in
SparkPlan which defaults to false. For more clarity, I'm using
functions.explode (which uses builtin Explode from generators.scala), not
DataFrame.explode (which requires a user function to be passed in).

Is this behavior correct? I suspect that unless we're using a
UserDefinedGenerator this isn't the right. Even in the case of
UserDefinedGenerator it seems the UserDefinedGenerator expression code
performs a manual convertToSafe. If my understanding is correct we should
be able to set "canProcessUnsafeRows" to be true in all cases. Can someone
who understands this part of the SQL code spot check me on this?

Thanks,
Hamel