You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Andres de la Peña (Jira)" <ji...@apache.org> on 2022/10/21 19:39:00 UTC

[jira] [Comment Edited] (CASSANDRA-17811) CQL aggregation functions on collections, tuples and UDTs

    [ https://issues.apache.org/jira/browse/CASSANDRA-17811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622442#comment-17622442 ] 

Andres de la Peña edited comment on CASSANDRA-17811 at 10/21/22 7:38 PM:
-------------------------------------------------------------------------

I have also adapted the {{fromjson}} and {{tojson}} functions to factories.

Same as {{{}token{}}}, the signatures of these functions were dynamically generated for specific calls with some ad-hoc operations spread across several places ({{{}FunctionResolver{}}}, {{FunctionCall}} and {{{}Selectable{}}}). Also, due to the way that the dynamic generation was implemented, it wasn't possible to apply them to anything but the selection clause.

After using the generic factories mechanism for creating them, all the special-casing is gone. Now they are like any other dynamic function, and they can be applied to the same contexts as any other function.

I have also rebased [the branch with within-collection aggregation functions|https://github.com/adelapena/cassandra/tree/17811-trunk-collections] on top of the last changes.


was (Author: adelapena):
I have also adapted the {{fromjson}} and {{tojson}} functions to factories.

Same as {{{}token{}}}, the signature of these functions was dynamically generated for specific calls with some ad-hoc operations across several classes ({{{}FunctionResolver{}}}, {{FunctionCall}} and {{{}Selectable{}}}). Also, due to the way that the dynamic generation was implemented, it wasn't possible to apply them to anything but the selection clause.

After using the generic factories mechanism for creating them, all the special-casing is gone. Now they are like any other dynamic function, and they can be applied to the same contexts as any other function.

I have also rebased [the branch with within-collection aggregation functions|https://github.com/adelapena/cassandra/tree/17811-trunk-collections] on top of the last changes.

> CQL aggregation functions on collections, tuples and UDTs
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-17811
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17811
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL/Semantics
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>
> It has been found during CASSANDRA-8877 that CQLS's aggregation functions {{{}max{}}}, {{min}} and {{count}} can be applied to collections, but the result is returned as a blob. For example:
> {code:java}
> CREATE TABLE t (k int PRIMARY KEY, l list<int>);
> INSERT INTO t(k, l) VALUES (0, [1, 2, 3]);
> INSERT INTO t(k, l) VALUES (1, [10, 20, 30]);
> SELECT max(l) FROM t;
>  system.max(l)
> ------------------------------------------------------------
>  0x00000003000000040000000a0000000400000014000000040000001e
> {code}
> This happens on 3.0, 3.11, 4.0, 4.1 and trunk.
> I'm not sure on whether the function shouldn't be supported for collections, or it should be supported but the result is wrong.
> In the example above, the returned blob is the serialized value of {{{}[10, 20, 30]{}}}, which is the right one according to the list comparator. I think this happens because the matched version of the function is the one for {{{}(blob) -> blob{}}}. We would need a {{(list<int>) -> list<int>}} function instead, but this function doesn't exist.
> It would be quite easy to add versions of the {{{}max{}}}, {{min}} and {{count}} functions for every type of collection ({{{}list<int>{}}}, {{{}list<text>{}}}, {{{}map<int, int>{}}}, {{{}map<int, text>{}}}, etc.). The downside of this approach is that it would increase the number of aggregation functions kept in memory from 82 to 2722, if my maths are right. This is quite an increase, mainly due to the many possible combinations of the {{map}} type. [Here|https://github.com/adelapena/cassandra/commit/e3ba3c2dc36ce58d06942078c708ffb93eb3cd84] is a quick, incomplete prototype of the approach.
> Also, I'm not sure that applying those aggregation functions to collections is very useful in practice. Thus, an alternative approach would be just forbidding them, considering them not supported. I don't think it would be a problem for backward compatibility since no one has complained about the current behaviour, and we might well consider that the original intent was not to allow aggregation on collections. At least, there aren't any tests for it, and I can't find any documentation about it either.
> Another idea that comes to mind is that we could change the meaning of those functions to aggregate the values within the collection, instead of aggregating the rows. In that case, the behaviour would be:
> {code:java}
> CREATE TABLE t (k int PRIMARY KEY, l list<int>);
> INSERT INTO t(k, l) VALUES (0, [1, 2, 3]);
> INSERT INTO t(k, l) VALUES (1, [10, 20, 30]);
> SELECT max(l) FROM t;
>  k | system.max(l)
> ---+-----------
>  1 | 30
>  0 | 3
> {code}
> Of course we could have separate function names for that type of collection aggregations, like {{{}collectionMax{}}}, {{{}maxItem{}}}, or something like that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org