You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Erik Erlandson <ee...@redhat.com> on 2019/07/06 00:22:22 UTC

Re: UDAFs have an inefficiency problem

I submitted a PR for this:
https://github.com/apache/spark/pull/25024

On Wed, Mar 27, 2019 at 4:19 PM Erik Erlandson <ee...@redhat.com> wrote:

> I describe some of the details here:
> https://issues.apache.org/jira/browse/SPARK-27296
>
> The short version of the story is that aggregating data structures (UDTs)
> used by UDAFs are serialized to a Row object, and de-serialized, for every
> row in a data frame.
> Cheers,
> Erik
>
>