You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2020/04/02 08:18:00 UTC

[jira] [Commented] (IMPALA-2792) Syntactic sugar for computing aggregates over nested collections.

    [ https://issues.apache.org/jira/browse/IMPALA-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073494#comment-17073494 ] 

Gabor Kaszab commented on IMPALA-2792:
--------------------------------------

I found this in a doc about Impala complex types:

A possibly more natural and faster way to express aggregations on complex types is to introduce “collection aggregate functions”. Each such function take as argument a column-reference expression that may mention collection types along its path. For example, to get the average number of orders per customer we would usually write:
SELECT AVG(cnt) FROM (
 SELECT COUNT(ord.id) cnt FROM customer cust, cust.orders ord
 GROUP BY cust.id)
Instead, we could use “collection aggregate functions” to write more concisely:
SELECT AVG(COUNT(orders)) FROM customer

The query using the collection aggregate function does not need unnesting. The COUNT function accepts a collection type (orders) and simply extracts the number of items from the collection. Note that the new COUNT function is not an aggregate function in the traditional sense because it does not reduce the cardinality of the input.

You can imagine similar variants of such collection aggregates, for example, the following query returns the average price of orders per customer.
SELECT AVG(price in orders) FROM customer
The new “in” expression produces an array of prices from an array of struct-typed orders, and the new AVG collection aggregate produces the average price.



> Syntactic sugar for computing aggregates over nested collections.
> -----------------------------------------------------------------
>
>                 Key: IMPALA-2792
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2792
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 2.3.0
>            Reporter: Alexander Behm
>            Assignee: Tamas Mate
>            Priority: Major
>              Labels: complextype, nested_types, planner, ramp-up, usability
>
> For user convenience and SQL brevity, we should add syntax extensions to concisely express aggregates over nested collections. Internally, we should re-write the concise versions into the more verbose equivalent with a correlated inline view.
> Example A:
> {code}
> New syntax:
> select count(c.orders) from customer c
> Internally rewrite to:
> select cnt from customer c, (select count(*) from c.orders) v
> {code}
> Example B:
> {code}
> New syntax:
> select avg(c.orders.items.price) from customer c
> Internally rewrite to:
> select a from customer c, (select avg(price) from c.orders.items) v
> {code}
> I suggest performing the rewrite inside StmtRewriter.java after rewriting all subqueries from the WHERE clause.
> Similar syntactic improvements should be considered for analytic functions on nested collections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org