You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Holden Karau (JIRA)" <ji...@apache.org> on 2017/11/15 15:30:02 UTC

[jira] [Reopened] (SPARK-6802) User Defined Aggregate Function Refactoring

     [ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Holden Karau reopened SPARK-6802:
---------------------------------

Now that the Arrow accelerated UDFs are in https://issues.apache.org/jira/browse/SPARK-21404 maybe we should re-consider this.

> User Defined Aggregate Function Refactoring
> -------------------------------------------
>
>                 Key: SPARK-6802
>                 URL: https://issues.apache.org/jira/browse/SPARK-6802
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>         Environment: We use Spark Dataframe, SQL along with json, sql and pandas quite a bit
>            Reporter: cynepia
>
> While trying to use custom aggregates in spark (something which is common in pandas), We realized that Custom Aggregate Functions aren't well supported across various features/functions in Spark beyond what is supported by Hive. There are futher discussions on the topic viz-a -viz the issue SPARK-3947, which points to similar improvement tickets opened earlier for refactoring the UDAF area.
> While we refactor the interface for aggregates, It would make sense to keep in consideration, the recently added DataFrame, GroupedData, and possibly also sql.dataframe.Column, which looks different from pandas.Series and isn't currently supporting any aggregations.
> Would like to get feedback from the folks, who are actively looking at this...
> We would be happy to participate and contribute, if there are any discussions on the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org