You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Erwang Guyomarc'h (Jira)" <ji...@apache.org> on 2020/10/20 17:15:00 UTC
[jira] [Created] (SPARK-33196) Expose filtered aggregation API
Erwang Guyomarc'h created SPARK-33196:
-----------------------------------------
Summary: Expose filtered aggregation API
Key: SPARK-33196
URL: https://issues.apache.org/jira/browse/SPARK-33196
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Erwang Guyomarc'h
Spark currently supports filtered aggregation but does not expose API allowing to use them when using the `spark.sql.functions` package.
It is possible to use them when writing directly SQL:
{code:scala}
scala> val df = spark.range(100)
scala> df.registerTempTable("df")
scala> spark.sql("select count(1) as classic_cnt, count(1) FILTER (WHERE id < 50) from df").show()
+-----------+-------------------------------------------------+
|classic_cnt|count(1) FILTER (WHERE (id < CAST(50 AS BIGINT)))|
+-----------+-------------------------------------------------+
| 100| 50|
+-----------+-------------------------------------------------+{code}
These aggregations are especially useful when filtering on overlapping datasets (where a pivot would not work):
{code:sql}
SELECT
AVG(revenue) FILTER (WHERE age < 25),
AVG(revenue) FILTER (WHERE age < 35),
AVG(revenue) FILTER (WHERE age < 45)
FROM people;{code}
I did not find an issue tracking this, hence I am creating this one and I will join a PR to illustrate a possible implementation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org