You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2015/07/22 22:21:04 UTC

[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring

    [ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637555#comment-14637555 ] 

Yin Huai commented on SPARK-6802:
---------------------------------

We have added Scala/Java UDAF support through SPARK-3947. Is this JIRA for Python UDAF?

> User Defined Aggregate Function Refactoring
> -------------------------------------------
>
>                 Key: SPARK-6802
>                 URL: https://issues.apache.org/jira/browse/SPARK-6802
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>         Environment: We use Spark Dataframe, SQL along with json, sql and pandas quite a bit
>            Reporter: cynepia
>
> While trying to use custom aggregates in spark (something which is common in pandas), We realized that Custom Aggregate Functions aren't well supported across various features/functions in Spark beyond what is supported by Hive. There are futher discussions on the topic viz-a -viz the issue SPARK-3947, which points to similar improvement tickets opened earlier for refactoring the UDAF area.
> While we refactor the interface for aggregates, It would make sense to keep in consideration, the recently added DataFrame, GroupedData, and possibly also sql.dataframe.Column, which looks different from pandas.Series and isn't currently supporting any aggregations.
> Would like to get feedback from the folks, who are actively looking at this...
> We would be happy to participate and contribute, if there are any discussions on the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org