You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/03/14 00:35:33 UTC

[jira] [Commented] (SPARK-13335) Optimize Data Frames collect_list and collect_set with declarative aggregates

    [ https://issues.apache.org/jira/browse/SPARK-13335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192570#comment-15192570 ] 

Apache Spark commented on SPARK-13335:
--------------------------------------

User 'mccheah' has created a pull request for this issue:
https://github.com/apache/spark/pull/11688

> Optimize Data Frames collect_list and collect_set with declarative aggregates
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-13335
>                 URL: https://issues.apache.org/jira/browse/SPARK-13335
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Matt Cheah
>            Priority: Minor
>
> Based on discussion from SPARK-9301, we can optimize collect_set and collect_list with declarative aggregate expressions, as opposed to using Hive UDAFs. The problem with Hive UDAFs is that they require converting the data items from catalyst types back to external types repeatedly. We can get around this by implementing declarative aggregate expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org