You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2021/08/05 19:18:00 UTC

[jira] [Created] (CALCITE-4720) Obsolete the Collect relational operator, using Aggregate and ARRAY_AGG (and new aggregate functions MULTISET_AGG and MAP_AGG) instead

Julian Hyde created CALCITE-4720:
------------------------------------

             Summary: Obsolete the Collect relational operator, using Aggregate and ARRAY_AGG (and new aggregate functions MULTISET_AGG and MAP_AGG) instead
                 Key: CALCITE-4720
                 URL: https://issues.apache.org/jira/browse/CALCITE-4720
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde


The {{Collect}} relational operator converts a multi-row relation into a relation with a single row and a column whose type is {{MULTISET}}.

But it is difficult to generalize it; we would like to:
*  Generating multiple rows, one for each group key, rather than a single row for the whole relation;
* Generate an {{ARRAY}} or {{MAP}} rather than a {{MULTISET};
* Generate a collection of scalars rather than a collection of records if the input is a single column (e.g. {{INTEGER MULTISET}} rather than {{ROW(INTEGER i) MULTISET}})

And, it is difficult to maintain; it is a minor RelNode that has only 2 implementations (that I know of) and I'm sure that there are bugs and missing support in SqlToRelConverter and the RelOptRule library.

We can achieve the same using the {{Aggregate}} operator and the {{ARRAY_AGG}} aggregate function. We would need new aggregate functions (let's call them {{MULTISET_AGG}} and {{MAP_AGG}}) for the {{MULTISET}} and {{MAP}} types.

Then we can obsolete {{Collect}}, and make current code paths use {{Aggregate}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)