You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Parth Gandhi (JIRA)" <ji...@apache.org> on 2018/07/18 21:54:00 UTC

[jira] [Commented] (SPARK-18186) Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation support

    [ https://issues.apache.org/jira/browse/SPARK-18186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548452#comment-16548452 ] 

Parth Gandhi commented on SPARK-18186:
--------------------------------------

Hi, there has been an issue lately with the library sketches-hive([https://github.com/DataSketches/sketches-hive)] that builds and runs a hive udaf on top of Spark SQL. In their method getNewAggregationBuffer() [https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/DataToSketchUDAF.java#L106,] they are initializing different state objects for modes Partial1 and Partial2. Their code used to work well with Spark 2.1 when Spark had support for mode "Complete". However, after it started supporting partial aggregation in Spark 2.2 onwards, their code gives an issue when partial merge is invoked here [https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/SketchEvaluator.java#L56], as the wrong state object is being passed in the merge function. I was just trying to understand the PR and wondering why did Spark stop supporting Complete mode in Hive UDAF or is there a way to still run in Complete mode which I am not aware of. Thank you.

> Migrate HiveUDAFFunction to TypedImperativeAggregate for partial aggregation support
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-18186
>                 URL: https://issues.apache.org/jira/browse/SPARK-18186
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.1
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>            Priority: Major
>             Fix For: 2.2.0
>
>
> Currently, Hive UDAFs in Spark SQL don't support partial aggregation. Any query involving any Hive UDAFs has to fall back to {{SortAggregateExec}} without partial aggregation.
> This issue can be fixed by migrating {{HiveUDAFFunction}} to {{TypedImperativeAggregate}}, which already provides partial aggregation support for aggregate functions that may use arbitrary Java objects as aggregation states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org