You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Michael Armbrust <mi...@databricks.com> on 2016/04/01 22:26:27 UTC

Re: Spark SQL UDF Returning Rows

>
> I haven't looked at Encoders or Datasets since we're bound to 1.6 for now
> but I'll look at encoders to see if that covers it. Datasets seems like it
> would solve this problem for sure.
>

There is an experimental preview of Datasets in Spark 1.6


> I avoided returning a case object because even if we use reflection to
> build byte code and do it efficiently. I still need to convert my Row to a
> case object manually within my UDF, just to have it converted to a Row
> again. Even if it's fast, it's still fairly necessary.
>

Even if you give us a Row there's still a conversion into the binary format
of InternalRow


> The thing I guess that threw me off was that UDF1/2/3 was in a "java"
> prefixed package although there was nothing that made it java specific and
> in fact was the only way to do what I wanted in scala. For things like
> JavaRDD, etc it makes sense, but for generic things like UDF is there a
> reason they get put into a package with "java" in the name?
>

This was before we decided to unify the APIs for Scala and Java, so its
mostly historical.