You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2018/08/04 07:12:00 UTC

[jira] [Commented] (SPARK-23911) High-order function: aggregate(array, initialState S, inputFunction, outputFunction) → R

    [ https://issues.apache.org/jira/browse/SPARK-23911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569108#comment-16569108 ] 

Xiao Li commented on SPARK-23911:
---------------------------------

Sounds good to me. Just changed reduce to aggregate

> High-order function: aggregate(array<T>, initialState S, inputFunction<S, T, S>, outputFunction<S, R>) → R
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23911
>                 URL: https://issues.apache.org/jira/browse/SPARK-23911
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Xiao Li
>            Assignee: Herman van Hovell
>            Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Returns a single value reduced from array. inputFunction will be invoked for each element in array in order. In addition to taking the element, inputFunction takes the current state, initially initialState, and returns the new state. outputFunction will be invoked to turn the final state into the result value. It may be the identity function (i -> i).
> {noformat}
> SELECT aggregate(ARRAY [], 0, (s, x) -> s + x, s -> s); -- 0
> SELECT aggregate(ARRAY [5, 20, 50], 0, (s, x) -> s + x, s -> s); -- 75
> SELECT aggregate(ARRAY [5, 20, NULL, 50], 0, (s, x) -> s + x, s -> s); -- NULL
> SELECT aggregate(ARRAY [5, 20, NULL, 50], 0, (s, x) -> s + COALESCE(x, 0), s -> s); -- 75
> SELECT aggregate(ARRAY [5, 20, NULL, 50], 0, (s, x) -> IF(x IS NULL, s, s + x), s -> s); -- 75
> SELECT aggregate(ARRAY [2147483647, 1], CAST (0 AS BIGINT), (s, x) -> s + x, s -> s); -- 2147483648
> SELECT aggregate(ARRAY [5, 6, 10, 20], -- calculates arithmetic average: 10.25
>               CAST(ROW(0.0, 0) AS ROW(sum DOUBLE, count INTEGER)),
>               (s, x) -> CAST(ROW(x + s.sum, s.count + 1) AS ROW(sum DOUBLE, count INTEGER)),
>               s -> IF(s.count = 0, NULL, s.sum / s.count));
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org