You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wan Kun (Jira)" <ji...@apache.org> on 2023/01/11 08:51:00 UTC
[jira] [Created] (SPARK-41981) Collapse percentile functions if possible
Wan Kun created SPARK-41981:
-------------------------------
Summary: Collapse percentile functions if possible
Key: SPARK-41981
URL: https://issues.apache.org/jira/browse/SPARK-41981
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.4.0
Reporter: Wan Kun
*percentile* function will put all the target elements into a Hashset and then compute the result. We can try to combine percentile functions into one and compute the result with only one Hashset.
For example:
{code:sql}
SELECT max(value1) as max_value1, percentile(value2, 0.3) as p1,
percentile(value3, 0.4) + percentile(value3, 0.5) as p2,
percentile(value2, 0.6) as p3
FROM t1
{code}
can be optimized to
{code:sql}
SELECT max_value1, _combined_percentile_0[0] as p1, p2, _combined_percentile_0[1] as p3
FROM (
SELECT max(value1) as max_value1,
percentile(value3, 0.4) + percentile(value3, 0.5) as p2,
percentile(value2, array(0.3, 0.6)) as _combined_percentile_0
FROM t1) as t1
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org