You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wan Kun (Jira)" <ji...@apache.org> on 2023/01/11 08:51:00 UTC

[jira] [Created] (SPARK-41981) Collapse percentile functions if possible

Wan Kun created SPARK-41981:
-------------------------------

             Summary: Collapse percentile functions if possible
                 Key: SPARK-41981
                 URL: https://issues.apache.org/jira/browse/SPARK-41981
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Wan Kun


*percentile* function will put all the target elements into a Hashset and then compute the result. We can try to combine percentile functions into one and compute the result with only one Hashset.
For example:
{code:sql}
SELECT max(value1) as max_value1, percentile(value2, 0.3) as p1,
        percentile(value3, 0.4) + percentile(value3, 0.5) as p2,
        percentile(value2, 0.6) as p3
FROM t1
{code}
can be optimized to 
{code:sql}
SELECT max_value1, _combined_percentile_0[0] as p1, p2, _combined_percentile_0[1] as p3
FROM (
   SELECT  max(value1) as max_value1,
           percentile(value3, 0.4) + percentile(value3, 0.5) as p2,
           percentile(value2, array(0.3, 0.6)) as _combined_percentile_0
   FROM t1) as t1
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org