You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/01/11 09:04:00 UTC

[jira] [Assigned] (SPARK-41981) Collapse percentile functions if possible

     [ https://issues.apache.org/jira/browse/SPARK-41981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-41981:
------------------------------------

    Assignee: Apache Spark

> Collapse percentile functions if possible
> -----------------------------------------
>
>                 Key: SPARK-41981
>                 URL: https://issues.apache.org/jira/browse/SPARK-41981
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Wan Kun
>            Assignee: Apache Spark
>            Priority: Major
>
> *percentile* function will put all the target elements into a Hashset and then compute the result. We can try to combine percentile functions into one and compute the result with only one Hashset.
> For example:
> {code:sql}
> SELECT max(value1) as max_value1, percentile(value2, 0.3) as p1,
>         percentile(value3, 0.4) + percentile(value3, 0.5) as p2,
>         percentile(value2, 0.6) as p3
> FROM t1
> {code}
> can be optimized to 
> {code:sql}
> SELECT max_value1, _combined_percentile_0[0] as p1, p2, _combined_percentile_0[1] as p3
> FROM (
>    SELECT  max(value1) as max_value1,
>            percentile(value3, 0.4) + percentile(value3, 0.5) as p2,
>            percentile(value2, array(0.3, 0.6)) as _combined_percentile_0
>    FROM t1) as t1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org