You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Shuyi Chen (JIRA)" <ji...@apache.org> on 2017/08/25 22:16:00 UTC

[jira] [Comment Edited] (FLINK-7491) Support COLLECT Aggregate function in Flink SQL

    [ https://issues.apache.org/jira/browse/FLINK-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142277#comment-16142277 ] 

Shuyi Chen edited comment on FLINK-7491 at 8/25/17 10:15 PM:
-------------------------------------------------------------

Thanks for reviewing the PR. [~jark]

Multiset and Array are different, and they support different set of operators (please see http://farrago.sourceforge.net/design/CollectionTypes.html). Also, the calcite definition of the COLLECT SqlAggFunction explicitly requires the return type to be a MultisetSqlType (see below)
{code:java}
  /**
   * The COLLECT operator. Multiset aggregator function.
   */
  public static final SqlAggFunction COLLECT =
      new SqlAggFunction("COLLECT",
          null,
          SqlKind.COLLECT,
          ReturnTypes.TO_MULTISET,
          null,
          OperandTypes.ANY,
          SqlFunctionCategory.SYSTEM, false, false) {
      };
{code}

I am worried that, if we use an Array to emulate a Multiset, going down the path, we might have performance problem for large multiset, and potentially calcite integration issues that are related to MultisetSqlType. What do you think?
 


was (Author: suez1224):
Thanks for reviewing the PR. [~jark]

I think Multiset and Array are different, and they support different set of operators (please see http://farrago.sourceforge.net/design/CollectionTypes.html). Also, the calcite definition of the COLLECT SqlAggFunction explicitly requires the return type to be a Multiset (see below)
{code:java}
  /**
   * The COLLECT operator. Multiset aggregator function.
   */
  public static final SqlAggFunction COLLECT =
      new SqlAggFunction("COLLECT",
          null,
          SqlKind.COLLECT,
          ReturnTypes.TO_MULTISET,
          null,
          OperandTypes.ANY,
          SqlFunctionCategory.SYSTEM, false, false) {
      };
{code}

I am worried that, if we use an Array to emulate a Multiset, going down the path, we might have performance problem for large multiset, and potentially calcite integration issues that are related to multiset. What do you think?
 

> Support COLLECT Aggregate function in Flink SQL
> -----------------------------------------------
>
>                 Key: FLINK-7491
>                 URL: https://issues.apache.org/jira/browse/FLINK-7491
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Shuyi Chen
>            Assignee: Shuyi Chen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)