You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Andres de la Peña (Jira)" <ji...@apache.org> on 2022/11/18 23:01:00 UTC
[jira] [Comment Edited] (CASSANDRA-18060) Add aggregation scalar functions on collections

    [ https://issues.apache.org/jira/browse/CASSANDRA-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636066#comment-17636066 ] 

Andres de la Peña edited comment on CASSANDRA-18060 at 11/18/22 11:00 PM:
--------------------------------------------------------------------------

Here is the patch, and CI is running:
||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/2024]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2508/workflows/92f054d7-9386-498f-9ba4-330181cd4782] [j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2508/workflows/8a0838e8-ffbb-424d-a572-3770f9a41632]|

Differently to [the prototype|https://github.com/apache/cassandra/compare/trunk...adelapena:cassandra:17811-trunk-collections?expand=1] mentioned during CASSANDRA-17811, the proposed PR uses the existing aggregation functions available at {{AggregateFcts}} as the underlying implementation of {{{}collection_min{}}}, {{{}collection_max{}}}, {{collection_sum}} and {{{}collection_avg{}}}. That way we avoid code duplication and make sure that the functions are consistent. However, that consistency means that we inherit the design decisions taken for those functions. The more remarkable ones IMO are
* {{sum}} and {{collection_sum}} return a value of the same type as the added values, so any numeric value but {{decimal}} and {{varint}} can overflow.


was (Author: adelapena):
Here is the patch, and CI is running:
||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/2024]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2508/workflows/92f054d7-9386-498f-9ba4-330181cd4782] [j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2508/workflows/8a0838e8-ffbb-424d-a572-3770f9a41632]|

Differently to [the prototype|https://github.com/apache/cassandra/compare/trunk...adelapena:cassandra:17811-trunk-collections?expand=1] mentioned during CASSANDRA-17811, the proposed PR uses the existing aggregation functions available at {{AggregateFcts}} as the underlying implementation of {{{}collection_min{}}}, {{{}collection_max{}}}, {{collection_sum}} and {{{}collection_avg{}}}. That way we avoid code duplication and make sure that the functions are consistent.

> Add aggregation scalar functions on collections
> -----------------------------------------------
>
>                 Key: CASSANDRA-18060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18060
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL/Semantics
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new mechanism for dynamically building native functions introduced by CASSANDRA-17811 can be used to provide within-collection aggregation functions. We can use that mechanism to add new CQL functions to get:
>  * The number of items in a collection.
>  * The max/min items of a collection.
>  * The sum/avg of the items of a numeric collection.
>  * The keys or the values of a map.
> For example:
> {code:java}
> CREATE TABLE k.t (k int PRIMARY KEY, l list<int>, m map<int, int>);
> INSERT INTO t(k, l, m) VALUES (0, [1, 2, 3], {1:10, 2:20, 3:30});
> > SELECT map_keys(m), map_values(m) FROM t;
>  system.map_keys(m) | system.map_values(m)
> --------------------+----------------------
>           {1, 2, 3} |         [10, 20, 30]
> > SELECT collection_count(m), collection_count(l) FROM t;
>  system.collection_count(m) | system.collection_count(l)
> ----------------------------+----------------------------
>                           3 |                          3
> > SELECT collection_min(l), collection_max(l) FROM t;
>  system.collection_min(l) | system.collection_max(l)
> --------------------------+--------------------------
>                         1 |                        3
> > SELECT collection_sum(l), collection_avg(l) FROM t;
>  system.collection_sum(l) | system.collection_avg(l)
> --------------------------+--------------------------
>                         6 |                        2
> {code}
> Note that this type of aggregation is different from the kind of aggregation provided by {{min}}, {{max}}, {{sum}} and {{avg}}, which aggregate entire collections across rows. Here we only aggregate the items of a collection row per row.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org