You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/24 07:51:09 UTC

[GitHub] [doris] TangSiyang2001 opened a new pull request, #15339: [Feature](aggregate-function) support funtion group_uniq_array

TangSiyang2001 opened a new pull request, #15339:
URL: https://github.com/apache/doris/pull/15339

   # Proposed changes
   
   Issue Number: close #13982
   
   ## Problem summary
   
   syntax: groupUniqArray(x) or groupUniqArray(max_size)(x)
   
   function: Creates an array from different argument values.
   ```SQL
   SELECT *
   FROM arrays_test
   ┌─s───────┬─arr─────┐
   │ alex    │ [1,2]   │
   │ hello   │ [1,2]   │
   │ World   │ [3,4,5] │
   │ Goodbye │ []      │
   └─────────┴─────────┘
   
   SELECT groupUniqArray(arr) from arrays_test
   ┌─groupUniqArray(arr)─┐
   │ [[],[1,2],[3,4,5]]  │
   └─────────────────────┘
   ```
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [x] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [x] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [x] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [x] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [x] No
   
   ## Further comments
   Docs and tests will be added during the following updates.
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1368429604

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1364491996

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-Lian commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
Yukang-Lian commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064168182


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/FunctionCallExpr.java:
##########
@@ -1148,12 +1148,12 @@ public void analyzeImpl(Analyzer analyzer) throws AnalysisException {
             for (int i = 0; i < children.size(); i++) {
                 if (children.get(i).type != Type.BOOLEAN) {
                     throw new AnalysisException("All params of "
-                        + fnName + " function must be boolean");
+                            + fnName + " function must be boolean");

Review Comment:
   maybe just modify the part you need to change, otherwise all the developers need to rebase.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhangstar333 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "zhangstar333 (via GitHub)" <gi...@apache.org>.
zhangstar333 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1109413017


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -49,18 +49,16 @@ struct AggregateFunctionCollectSetData {
         data_set.insert(assert_cast<const ColVecType&>(column).get_data()[row_num]);
     }
 
-    void merge(const SelfType& rhs) { data_set.merge(rhs.data_set); }
-
-    void merge(const SelfType& rhs, bool has_limit) {
-        if (!has_limit) {
-            merge(rhs);
-            return;
-        }
-        for (auto& rhs_elem : rhs.data_set) {
-            if (size() >= max_size) {
-                return;
+    void merge(const SelfType& rhs) {
+        if constexpr (HasLimit::value) {
+            data_set.merge(rhs.data_set);
+        } else {

Review Comment:
   the template condition seems to mistake in writing, have limit but merge all data



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430948931

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhangstar333 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "zhangstar333 (via GitHub)" <gi...@apache.org>.
zhangstar333 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1109418237


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/collect_list.md:
##########
@@ -30,9 +30,8 @@ under the License.
 
 `ARRAY<T> collect_list(expr)`
 
-Returns an array consisting of all values in expr within the group.
-The order of elements in the array is non-deterministic. NULL values are excluded.
-
+Returns an array consisting of all values in expr within the group, and ,with the optional `max_size` parameter limits the size of the resulting array to `max_size` elements.The order of elements in the array is non-deterministic. NULL values are excluded.
+It has an alias `group_array`.

Review Comment:
   ennnn... here I'm not sure is that, if have 100 rows in column, but when set max_size = 1, whether result will be same always
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1437675148

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1433972025

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430723619

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman merged PR #15339:
URL: https://github.com/apache/doris/pull/15339


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1369543948

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhangstar333 commented on a diff in pull request #15339: [feature-wip](aggregate-function) support funtion group_uniq_array

Posted by "zhangstar333 (via GitHub)" <gi...@apache.org>.
zhangstar333 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1090126539


##########
docs/zh-CN/docs/sql-manual/sql-functions/aggregate-functions/group_uniq_array.md:
##########
@@ -0,0 +1,80 @@
+---
+{
+    "title": "GROUP_UNIQ_ARRAY",
+    "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## GROUP_UNIQ_ARRAY
+### description

Review Comment:
   GROUP_UNIQ_ARRAY seems is same as collect_set?
   why not impl max_size on collect_set function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1111446544


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/collect_list.md:
##########
@@ -30,9 +30,8 @@ under the License.
 
 `ARRAY<T> collect_list(expr)`
 
-Returns an array consisting of all values in expr within the group.
-The order of elements in the array is non-deterministic. NULL values are excluded.
-
+Returns an array consisting of all values in expr within the group, and ,with the optional `max_size` parameter limits the size of the resulting array to `max_size` elements.The order of elements in the array is non-deterministic. NULL values are excluded.
+It has an alias `group_array`.

Review Comment:
   if you have more replica in different BE,the result will be not same. so you must add stable case in regression test in any env. just `length(arry)` to make sure res is stable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1108021467


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/collect_list.md:
##########
@@ -30,9 +30,8 @@ under the License.
 
 `ARRAY<T> collect_list(expr)`
 
-Returns an array consisting of all values in expr within the group.
-The order of elements in the array is non-deterministic. NULL values are excluded.
-
+Returns an array consisting of all values in expr within the group, and ,with the optional `max_size` parameter limits the size of the resulting array to `max_size` elements.The order of elements in the array is non-deterministic. NULL values are excluded.
+It has an alias `group_array`.

Review Comment:
   Non-deterministic in docs refers to the result order compared to which in the passed in columns, however, from the collect function's perspective, it is idempotent, and will not result in test failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1109823525


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/collect_list.md:
##########
@@ -30,9 +30,8 @@ under the License.
 
 `ARRAY<T> collect_list(expr)`
 
-Returns an array consisting of all values in expr within the group.
-The order of elements in the array is non-deterministic. NULL values are excluded.
-
+Returns an array consisting of all values in expr within the group, and ,with the optional `max_size` parameter limits the size of the resulting array to `max_size` elements.The order of elements in the array is non-deterministic. NULL values are excluded.
+It has an alias `group_array`.

Review Comment:
   According to several regression tests up to now, the problem didn't come up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1437266720

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064242499


##########
regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_group_array.groovy:
##########
@@ -0,0 +1,261 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_aggregate_group_array"){
+    sql "set enable_vectorized_engine = true"
+
+    def tableName = "group_uniq_array_test"
+    def tableCTAS1 = "group_uniq_array_test_ctas1"
+    def tableCTAS2 = "group_uniq_array_test_ctas2"
+
+    sql "DROP TABLE IF EXISTS ${tableName}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS1}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS2}"
+
+    sql """
+        CREATE TABLE IF NOT EXISTS ${tableName} (
+	        c_id INT,
+            c_bool BOOLEAN,
+            c_tinyint TINYINT,
+            c_smallint SMALLINT,
+            c_int INT,
+            c_bigint BIGINT,
+            c_largeint LARGEINT,
+            c_float FLOAT,
+            c_double DOUBLE,
+            c_decimal DECIMAL(9, 2),
+            c_char CHAR,
+            c_varchar VARCHAR(10),
+            c_string STRING,
+            c_date DATE,
+            c_datev2 DATEV2,
+            c_date_time DATETIME,
+            c_date_timev2 DATETIMEV2(6),
+            c_string_not_null VARCHAR(10) NOT NULL
+	    )
+	    DISTRIBUTED BY HASH(c_int) BUCKETS 1
+	    PROPERTIES (
+	      "replication_num" = "1"
+	    )
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 10, 20, 30, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 11, 21, 33, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, true, 11, 12, 13, 1444444444444, 1555555555, 1.1, 1.222, 13333.33, 'd', 'varchar2', 'string2',
+            '2022-12-02', '2022-12-02', '2022-12-02 22:23:23', '2022-12-02 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 21, 22, 23, 2444444444444, 255555555, 2.1, 2.222, 23333.33, 'f', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, true, 31, 32, 33, 3444444444444, 3555555555, 3.1, 3.222, 33333.33, 'l', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 10, 20, 30, 944444444444, 9555555555, 9.1, 9.222, 93333.33, 'p', 'varchar9', 'string9',
+            '2022-12-09', '2022-12-09', '2022-12-09 22:23:23', '2022-12-09 22:23:24.999999', 'not null')
+    """
+
+    qt_select """
+        SELECT
+            group_uniq_array(c_bool),
+            group_uniq_array(c_tinyint),
+            group_uniq_array(c_smallint),
+            group_uniq_array(c_int),
+            group_uniq_array(c_bigint),
+            group_uniq_array(c_largeint),
+            group_uniq_array(c_float),
+            group_uniq_array(c_double),
+            group_uniq_array(c_decimal),
+            group_uniq_array(c_char),
+            group_uniq_array(c_varchar),
+            group_uniq_array(c_string),
+            group_uniq_array(c_date),
+            group_uniq_array(c_datev2),
+            group_uniq_array(c_date_time),
+            group_uniq_array(c_date_timev2),
+            group_uniq_array(c_string_not_null)
+        FROM
+            ${tableName}
+    """
+
+    qt_select """
+        SELECT
+            group_uniq_array(c_bool,1),
+            group_uniq_array(c_tinyint,1),
+            group_uniq_array(c_smallint,1),
+            group_uniq_array(c_int,1),
+            group_uniq_array(c_bigint,1),
+            group_uniq_array(c_largeint,1),
+            group_uniq_array(c_float,1),
+            group_uniq_array(c_double,1),
+            group_uniq_array(c_decimal,1),
+            group_uniq_array(c_char,1),
+            group_uniq_array(c_varchar,1),
+            group_uniq_array(c_string,1),
+            group_uniq_array(c_date,1),
+            group_uniq_array(c_datev2,1),
+            group_uniq_array(c_date_time,1),
+            group_uniq_array(c_date_timev2,1),
+            group_uniq_array(c_string_not_null,1)
+        FROM
+            ${tableName}
+    """
+
+    qt_select """
+        SELECT
+            group_uniq_array(c_bool),
+            group_uniq_array(c_tinyint),
+            group_uniq_array(c_smallint),
+            group_uniq_array(c_int),
+            group_uniq_array(c_bigint),
+            group_uniq_array(c_largeint),
+            group_uniq_array(c_float),
+            group_uniq_array(c_double),
+            group_uniq_array(c_decimal),
+            group_uniq_array(c_char),
+            group_uniq_array(c_varchar),
+            group_uniq_array(c_string),
+            group_uniq_array(c_date),
+            group_uniq_array(c_datev2),
+            group_uniq_array(c_date_time),
+            group_uniq_array(c_date_timev2),
+            group_uniq_array(c_string_not_null)
+        FROM
+            ${tableName}
+        GROUP BY
+            c_id
+        ORDER BY
+            c_id
+    """
+
+    qt_select """
+        SELECT
+            group_uniq_array(c_bool,1),
+            group_uniq_array(c_tinyint,1),
+            group_uniq_array(c_smallint,1),
+            group_uniq_array(c_int,1),
+            group_uniq_array(c_bigint,1),
+            group_uniq_array(c_largeint,1),
+            group_uniq_array(c_float,1),
+            group_uniq_array(c_double,1),
+            group_uniq_array(c_decimal,1),
+            group_uniq_array(c_char,1),
+            group_uniq_array(c_varchar,1),
+            group_uniq_array(c_string,1),
+            group_uniq_array(c_date,1),
+            group_uniq_array(c_datev2,1),
+            group_uniq_array(c_date_time,1),
+            group_uniq_array(c_date_timev2,1),
+            group_uniq_array(c_string_not_null,1)
+        FROM
+            ${tableName}
+        GROUP BY
+            c_id
+        ORDER BY
+            c_id
+    """
+
+    sql """

Review Comment:
   Just for group-by and non-group-by cases, which was in reference to the `test-aggregate-histogram`, if it makes no sense here, I'll remove them later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430727755

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430730562

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430943138

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1433162529

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-Lian commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
Yukang-Lian commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064167701


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/group_uniq_array.md:
##########
@@ -0,0 +1,69 @@
+---
+{
+    "title": "GROUP_UNIQ_ARRAY",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## GROUP_UNIQ_ARRAY
+### description
+#### Syntax
+
+`ARRAY<T> collect_set(expr[,max_size])`
+
+Creates an array from different argument values,with the optional max_size parameter limits the size of the resulting array to `max_size` elements.
+

Review Comment:
   we need to add more detailed instructions here and the examples below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064237807


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/group_uniq_array.md:
##########
@@ -0,0 +1,69 @@
+---
+{
+    "title": "GROUP_UNIQ_ARRAY",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## GROUP_UNIQ_ARRAY
+### description
+#### Syntax
+
+`ARRAY<T> collect_set(expr[,max_size])`
+
+Creates an array from different argument values,with the optional max_size parameter limits the size of the resulting array to `max_size` elements.
+

Review Comment:
   Ok,I will specify them later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064238472


##########
fe/fe-core/src/main/java/org/apache/doris/analysis/FunctionCallExpr.java:
##########
@@ -1148,12 +1148,12 @@ public void analyzeImpl(Analyzer analyzer) throws AnalysisException {
             for (int i = 0; i < children.size(); i++) {
                 if (children.get(i).type != Type.BOOLEAN) {
                     throw new AnalysisException("All params of "
-                        + fnName + " function must be boolean");
+                            + fnName + " function must be boolean");

Review Comment:
   Some of them were modified for the checkstyle failure, I will check in detail later.



##########
fe/fe-core/src/main/java/org/apache/doris/analysis/FunctionCallExpr.java:
##########
@@ -1148,12 +1148,12 @@ public void analyzeImpl(Analyzer analyzer) throws AnalysisException {
             for (int i = 0; i < children.size(); i++) {
                 if (children.get(i).type != Type.BOOLEAN) {
                     throw new AnalysisException("All params of "
-                        + fnName + " function must be boolean");
+                            + fnName + " function must be boolean");

Review Comment:
   Some of them were modified for the checkstyle failure, I will check it in detail later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064239431


##########
regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_group_array.groovy:
##########
@@ -0,0 +1,261 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_aggregate_group_array"){
+    sql "set enable_vectorized_engine = true"
+
+    def tableName = "group_uniq_array_test"
+    def tableCTAS1 = "group_uniq_array_test_ctas1"
+    def tableCTAS2 = "group_uniq_array_test_ctas2"
+
+    sql "DROP TABLE IF EXISTS ${tableName}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS1}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS2}"
+
+    sql """
+        CREATE TABLE IF NOT EXISTS ${tableName} (
+	        c_id INT,
+            c_bool BOOLEAN,
+            c_tinyint TINYINT,
+            c_smallint SMALLINT,
+            c_int INT,
+            c_bigint BIGINT,
+            c_largeint LARGEINT,
+            c_float FLOAT,
+            c_double DOUBLE,
+            c_decimal DECIMAL(9, 2),
+            c_char CHAR,
+            c_varchar VARCHAR(10),
+            c_string STRING,
+            c_date DATE,
+            c_datev2 DATEV2,
+            c_date_time DATETIME,
+            c_date_timev2 DATETIMEV2(6),
+            c_string_not_null VARCHAR(10) NOT NULL
+	    )
+	    DISTRIBUTED BY HASH(c_int) BUCKETS 1
+	    PROPERTIES (
+	      "replication_num" = "1"
+	    )
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 10, 20, 30, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 11, 21, 33, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, true, 11, 12, 13, 1444444444444, 1555555555, 1.1, 1.222, 13333.33, 'd', 'varchar2', 'string2',
+            '2022-12-02', '2022-12-02', '2022-12-02 22:23:23', '2022-12-02 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 21, 22, 23, 2444444444444, 255555555, 2.1, 2.222, 23333.33, 'f', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, true, 31, 32, 33, 3444444444444, 3555555555, 3.1, 3.222, 33333.33, 'l', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 10, 20, 30, 944444444444, 9555555555, 9.1, 9.222, 93333.33, 'p', 'varchar9', 'string9',
+            '2022-12-09', '2022-12-09', '2022-12-09 22:23:23', '2022-12-09 22:23:24.999999', 'not null')
+    """
+

Review Comment:
   Good suggestion, I will add it later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1439922461

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1439922626

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1111497723


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/collect_list.md:
##########
@@ -30,9 +30,8 @@ under the License.
 
 `ARRAY<T> collect_list(expr)`
 
-Returns an array consisting of all values in expr within the group.
-The order of elements in the array is non-deterministic. NULL values are excluded.
-
+Returns an array consisting of all values in expr within the group, and ,with the optional `max_size` parameter limits the size of the resulting array to `max_size` elements.The order of elements in the array is non-deterministic. NULL values are excluded.
+It has an alias `group_array`.

Review Comment:
   I see, I'll modify it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1111549208


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -31,122 +33,180 @@
 
 namespace doris::vectorized {
 
-template <typename T>
+template <typename T, typename HasLimit>
 struct AggregateFunctionCollectSetData {
     using ElementType = T;
     using ColVecType = ColumnVectorOrDecimal<ElementType>;
     using ElementNativeType = typename NativeType<T>::Type;
+    using SelfType = AggregateFunctionCollectSetData;
     using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>, 4>;
-    Set set;
+    Set data_set;
+    UInt64 max_size;
+
+    size_t size() const { return data_set.size(); }
 
     void add(const IColumn& column, size_t row_num) {
-        const auto& vec = assert_cast<const ColVecType&>(column).get_data();
-        set.insert(vec[row_num]);
+        data_set.insert(assert_cast<const ColVecType&>(column).get_data()[row_num]);
+    }
+
+    void merge(const SelfType& rhs) {
+        if constexpr (HasLimit::value) {
+            for (auto& rhs_elem : rhs.data_set) {
+                if (size() >= max_size) {
+                    return;
+                }
+                data_set.insert(rhs_elem.get_value());
+            }
+        } else {
+            data_set.merge(rhs.data_set);
+        }
     }
-    void merge(const AggregateFunctionCollectSetData& rhs) { set.merge(rhs.set); }
-    void write(BufferWritable& buf) const { set.write(buf); }
-    void read(BufferReadable& buf) { set.read(buf); }
-    void reset() { set.clear(); }
+
+    void write(BufferWritable& buf) const { data_set.write(buf); }
+
+    void read(BufferReadable& buf) { data_set.read(buf); }
+
     void insert_result_into(IColumn& to) const {
         auto& vec = assert_cast<ColVecType&>(to).get_data();
-        vec.reserve(set.size());
-        for (auto item : set) {
+        vec.reserve(size());
+        for (auto item : data_set) {

Review Comment:
   `const auto& item`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1433180767

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1369498834

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1106604940


##########
be/src/vec/exec/join/process_hash_table_probe_impl.h:
##########
@@ -607,6 +607,7 @@ Status ProcessHashTableProbe<JoinOpType>::do_process_with_other_join_conjuncts(
                             }
                         }
                     }
+<<<<<<< HEAD

Review Comment:
   warning: version control conflict marker in file [clang-diagnostic-error]
   ```cpp
   <<<<<<< HEAD
   ^
   ```
   



##########
be/test/vec/aggregate_functions/agg_group_array_test.cpp:
##########
@@ -0,0 +1,139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <gtest/gtest.h>
+
+#include "gtest/gtest.h"
+#include "vec/aggregate_functions/aggregate_function_collect.h"
+#include "vec/aggregate_functions/aggregate_function_simple_factory.h"
+#include "vec/common/arena.h"
+#include "vec/core/field.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_date.h"
+#include "vec/data_types/data_type_date_time.h"
+#include "vec/data_types/data_type_decimal.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+
+namespace doris::vectorized {
+
+void register_aggregate_function_group_uniq_array(AggregateFunctionSimpleFactory& factory);
+
+class VAggGroupArrayTest : public testing::Test {
+private:
+    Arena _agg_arena_pool;
+
+public:
+    void SetUp() override {
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        register_aggregate_function_group_uniq_array(factory);
+    }
+
+    void TearDown() override {}
+
+    template <typename DataType>
+    void agg_group_uniq_array_add_elements(AggregateFunctionPtr agg_function,
+                                           AggregateDataPtr place, size_t input_nums) {
+        using FieldType = typename DataType::FieldType;
+        auto type = std::make_shared<DataType>();
+        auto input_col = type->create_column();
+        for (size_t i = 0; i < input_nums; ++i) {
+            if constexpr (std::is_same_v<DataType, DataTypeString>) {
+                auto item = std::string("item") + std::to_string(i);
+                input_col->insert_data(item.c_str(), item.size());
+            } else {
+                auto item = FieldType(static_cast<uint64_t>(i));
+                input_col->insert_data(reinterpret_cast<const char*>(&item), 0);
+            }
+        }
+        EXPECT_EQ(input_col->size(), input_nums);
+
+        const IColumn* column[1] = {input_col.get()};
+        for (int i = 0; i < input_col->size(); i++) {
+            agg_function->add(place, column, i, &_agg_arena_pool);
+        }
+    }
+
+    template <typename DataType>
+    void test_agg_group_uniq_array(size_t input_nums = 0) {
+        DataTypes data_types = {(DataTypePtr)std::make_shared<DataType>()};
+        LOG(INFO) << "test_agg_group_uniq_array for type"
+                  << "(" << data_types[0]->get_name() << ")";
+
+        Array array;
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        auto agg_function = factory.get("group_uniq_array", data_types, array);
+        EXPECT_NE(agg_function, nullptr);
+
+        std::unique_ptr<char[]> memory(new char[agg_function->size_of_data()]);
+        AggregateDataPtr place = memory.get();
+        agg_function->create(place);
+
+        agg_group_uniq_array_add_elements<DataType>(agg_function, place, input_nums);
+
+        ColumnString buf;

Review Comment:
   warning: calling a private constructor of class 'doris::vectorized::ColumnString' [clang-diagnostic-error]
   ```cpp
           ColumnString buf;
                        ^
   ```
   **be/src/vec/columns/column_string.h:76:** declared private here
   ```cpp
       ColumnString() = default;
       ^
   ```
   



##########
be/src/exprs/timestamp_functions.cpp:
##########
@@ -0,0 +1,1036 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/apache/impala/blob/branch-2.9.0/be/src/exprs/timestamp-functions.cc
+// and modified by Doris
+
+#include "exprs/timestamp_functions.h"

Review Comment:
   warning: 'exprs/timestamp_functions.h' file not found [clang-diagnostic-error]
   ```cpp
   #include "exprs/timestamp_functions.h"
            ^
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1431148225

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430879185

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhangstar333 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "zhangstar333 (via GitHub)" <gi...@apache.org>.
zhangstar333 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1107978178


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/collect_list.md:
##########
@@ -30,9 +30,8 @@ under the License.
 
 `ARRAY<T> collect_list(expr)`
 
-Returns an array consisting of all values in expr within the group.
-The order of elements in the array is non-deterministic. NULL values are excluded.
-
+Returns an array consisting of all values in expr within the group, and ,with the optional `max_size` parameter limits the size of the resulting array to `max_size` elements.The order of elements in the array is non-deterministic. NULL values are excluded.
+It has an alias `group_array`.

Review Comment:
   `The order of elements in the array is non-deterministic`, so the groovy test case of result  is also unstable, maybe be failed in other PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1368422308

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064237669


##########
be/test/vec/aggregate_functions/agg_group_array_test.cpp:
##########
@@ -0,0 +1,142 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <gtest/gtest.h>
+
+#include "gtest/gtest.h"
+#include "vec/aggregate_functions/aggregate_function_collect.h"
+#include "vec/aggregate_functions/aggregate_function_simple_factory.h"
+#include "vec/common/arena.h"
+#include "vec/core/field.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_date.h"
+#include "vec/data_types/data_type_date_time.h"
+#include "vec/data_types/data_type_decimal.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+
+namespace doris::vectorized {
+
+void register_aggregate_function_group_uniq_array(AggregateFunctionSimpleFactory& factory);
+
+class VAggGroupArrayTest : public testing::Test {
+private:
+    Arena _agg_arena_pool;
+
+public:
+    void SetUp() override {
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        register_aggregate_function_group_uniq_array(factory);
+    }
+
+    void TearDown() override {}
+
+    template <typename DataType>
+    void agg_group_uniq_array_add_elements(AggregateFunctionPtr agg_function,
+                                           AggregateDataPtr place, size_t input_nums) {
+        using FieldType = typename DataType::FieldType;
+        auto type = std::make_shared<DataType>();
+        auto input_col = type->create_column();
+        for (size_t i = 0; i < input_nums; ++i) {
+            if constexpr (std::is_same_v<DataType, DataTypeString>) {
+                auto item = std::string("item") + std::to_string(i);
+                input_col->insert_data(item.c_str(), item.size());
+            } else {
+                auto item = FieldType(static_cast<uint64_t>(i));
+                input_col->insert_data(reinterpret_cast<const char*>(&item), 0);
+            }
+        }
+        EXPECT_EQ(input_col->size(), input_nums);
+
+        const IColumn* column[1] = {input_col.get()};
+        for (int i = 0; i < input_col->size(); i++) {
+            agg_function->add(place, column, i, &_agg_arena_pool);
+        }
+    }
+
+    template <typename DataType>
+    void test_agg_group_uniq_array(size_t input_nums = 0) {
+        DataTypes data_types = {(DataTypePtr)std::make_shared<DataType>()};
+        LOG(INFO) << "test_agg_group_uniq_array for type"
+                  << "(" << data_types[0]->get_name() << ")";
+
+        Array array;
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        auto agg_function = factory.get("group_uniq_array", data_types, array);
+        EXPECT_NE(agg_function, nullptr);
+
+        std::unique_ptr<char[]> memory(new char[agg_function->size_of_data()]);
+        AggregateDataPtr place = memory.get();
+        agg_function->create(place);
+
+        agg_group_uniq_array_add_elements<DataType>(agg_function, place, input_nums);
+
+        ColumnString buf;
+        VectorBufferWriter buf_writer(buf);
+        agg_function->serialize(place, buf_writer);
+        buf_writer.commit();
+        VectorBufferReader buf_reader(buf.get_data_at(0));
+        agg_function->deserialize(place, buf_reader, &_agg_arena_pool);
+
+        std::unique_ptr<char[]> memory2(new char[agg_function->size_of_data()]);
+        AggregateDataPtr place2 = memory2.get();
+        agg_function->create(place2);
+
+        agg_group_uniq_array_add_elements<DataType>(agg_function, place2, input_nums);
+
+        agg_function->merge(place, place2, &_agg_arena_pool);
+        auto column_result = ColumnArray::create(data_types[0]->create_column());
+        agg_function->insert_result_into(place, *column_result);
+        EXPECT_EQ(column_result->size(), 1);
+        EXPECT_EQ(column_result->get_offsets()[0], input_nums);
+
+        auto column_result2 = ColumnArray::create(data_types[0]->create_column());
+        agg_function->insert_result_into(place2, *column_result2);
+        EXPECT_EQ(column_result2->size(), 1);
+        EXPECT_EQ(column_result->get_offsets()[0], input_nums);
+
+        LOG(INFO) << column_result->get_offsets()[0];
+        LOG(INFO) << column_result2->get_offsets()[0];

Review Comment:
   Added for dubugging ,I will remove them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430718478

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430723901

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430727032

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1434608312

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [feature-wip](aggregate-function) support funtion group_uniq_array

Posted by github-actions.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1400092876

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1447971661

   > Hi, Does the case in this pr need to add some `order by` to ensure that the results are always consistent? I'm running into something that looks like it's sorting related.
   > 
   > ```
   > 2023-02-28 16:58:32.363 ERROR [suite-thread-3] (ScriptContext.groovy:121) - Run test_aggregate_collect in /doris/regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_collect.groovy failed
   > java.lang.IllegalStateException: Check tag 'select' failed:
   > Check tag 'select' failed, line 1, CHAR result mismatch.
   > Expect cell is: [1555555555, 3555555555, 9555555555, 255555555, 55555555555]
   > But real is: [1555555555, 9555555555, 3555555555, 255555555, 55555555555]
   > ```
   It seems so, I will add `order_by` to ensure the stability.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1438024780

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430872053

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1108009442


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -177,21 +243,31 @@ struct AggregateFunctionCollectListData<StringRef> {
 
     void insert_result_into(IColumn& to) const {
         auto& to_str = assert_cast<ColVecType&>(to);
-        to_str.insert_range_from(*data, 0, data->size());
+        to_str.insert_range_from(*data, 0, size());
     }
 };
 
-template <typename Data>
-class AggregateFunctionCollect final
-        : public IAggregateFunctionDataHelper<Data, AggregateFunctionCollect<Data>> {
-public:
-    static constexpr bool alloc_memory_in_arena =
-            std::is_same_v<Data, AggregateFunctionCollectSetData<StringRef>>;
+template <typename Data, typename HasLimit>
+class AggregateFunctionCollect
+        : public IAggregateFunctionDataHelper<Data, AggregateFunctionCollect<Data, HasLimit>> {
+    using GenericType = AggregateFunctionCollectSetData<StringRef>;
+
+    static constexpr bool HAS_LIMIT = HasLimit::value;
+    static constexpr bool ENABLE_ARENA = std::is_same_v<Data, GenericType>;
 
-    AggregateFunctionCollect(const DataTypes& argument_types_)
-            : IAggregateFunctionDataHelper<Data, AggregateFunctionCollect<Data>>(argument_types_,
-                                                                                 {}),
-              _argument_type(argument_types_[0]) {}
+public:
+    AggregateFunctionCollect(const DataTypePtr& argument_type, const Array& parameters_,
+                             UInt64 max_size_ = std::numeric_limits<UInt64>::max())
+            : IAggregateFunctionDataHelper<Data, AggregateFunctionCollect<Data, HasLimit>>(
+                      {argument_type}, parameters_),
+              return_type(argument_type) {}
+
+    AggregateFunctionCollect(const DataTypePtr& argument_type, const Array& parameters_,

Review Comment:
   Redundance result of migrating from the original code, I'll rm it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430738432

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430898350

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064239431


##########
regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_group_array.groovy:
##########
@@ -0,0 +1,261 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_aggregate_group_array"){
+    sql "set enable_vectorized_engine = true"
+
+    def tableName = "group_uniq_array_test"
+    def tableCTAS1 = "group_uniq_array_test_ctas1"
+    def tableCTAS2 = "group_uniq_array_test_ctas2"
+
+    sql "DROP TABLE IF EXISTS ${tableName}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS1}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS2}"
+
+    sql """
+        CREATE TABLE IF NOT EXISTS ${tableName} (
+	        c_id INT,
+            c_bool BOOLEAN,
+            c_tinyint TINYINT,
+            c_smallint SMALLINT,
+            c_int INT,
+            c_bigint BIGINT,
+            c_largeint LARGEINT,
+            c_float FLOAT,
+            c_double DOUBLE,
+            c_decimal DECIMAL(9, 2),
+            c_char CHAR,
+            c_varchar VARCHAR(10),
+            c_string STRING,
+            c_date DATE,
+            c_datev2 DATEV2,
+            c_date_time DATETIME,
+            c_date_timev2 DATETIMEV2(6),
+            c_string_not_null VARCHAR(10) NOT NULL
+	    )
+	    DISTRIBUTED BY HASH(c_int) BUCKETS 1
+	    PROPERTIES (
+	      "replication_num" = "1"
+	    )
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 10, 20, 30, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 11, 21, 33, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, true, 11, 12, 13, 1444444444444, 1555555555, 1.1, 1.222, 13333.33, 'd', 'varchar2', 'string2',
+            '2022-12-02', '2022-12-02', '2022-12-02 22:23:23', '2022-12-02 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 21, 22, 23, 2444444444444, 255555555, 2.1, 2.222, 23333.33, 'f', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, true, 31, 32, 33, 3444444444444, 3555555555, 3.1, 3.222, 33333.33, 'l', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 10, 20, 30, 944444444444, 9555555555, 9.1, 9.222, 93333.33, 'p', 'varchar9', 'string9',
+            '2022-12-09', '2022-12-09', '2022-12-09 22:23:23', '2022-12-09 22:23:24.999999', 'not null')
+    """
+

Review Comment:
   Good suggestion, I will rearrange the order.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1364482485

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1437676071

   run feut


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1438023384

   Regression .out file has not been modified, waiting for ipv6 bug in `master` fixed and then regression test is able to run on the local env. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1111501259


##########
be/src/vec/aggregate_functions/aggregate_function_collect.cpp:
##########
@@ -18,78 +18,88 @@
 #include "vec/aggregate_functions/aggregate_function_collect.h"
 
 #include "vec/aggregate_functions/aggregate_function_simple_factory.h"
+#include "vec/aggregate_functions/helpers.h"
 
 namespace doris::vectorized {
 
-template <typename T>
-AggregateFunctionPtr create_agg_function_collect(bool distinct, const DataTypes& argument_types) {
+#define FOR_DECIMAL_TYPES(M) \
+    M(Decimal32)             \
+    M(Decimal64)             \
+    M(Decimal128)            \
+    M(Decimal128I)
+
+template <typename T, typename HasLimit, typename... TArgs>
+AggregateFunctionPtr do_create_agg_function_collect(bool distinct, const DataTypePtr& argument_type,
+                                                    TArgs... args) {
     if (distinct) {
         return AggregateFunctionPtr(
-                new AggregateFunctionCollect<AggregateFunctionCollectSetData<T>>(argument_types));
+                new AggregateFunctionCollect<AggregateFunctionCollectSetData<T, HasLimit>,
+                                             HasLimit>(argument_type,
+                                                       std::forward<TArgs>(args)...));
     } else {
         return AggregateFunctionPtr(
-                new AggregateFunctionCollect<AggregateFunctionCollectListData<T>>(argument_types));
+                new AggregateFunctionCollect<AggregateFunctionCollectListData<T, HasLimit>,
+                                             HasLimit>(argument_type,
+                                                       std::forward<TArgs>(args)...));
     }
 }
 
-AggregateFunctionPtr create_aggregate_function_collect(const std::string& name,
-                                                       const DataTypes& argument_types,
-                                                       const bool result_is_nullable) {
-    if (argument_types.size() != 1) {
-        LOG(WARNING) << fmt::format("Illegal number {} of argument for aggregate function {}",
-                                    argument_types.size(), name);
-        return nullptr;
-    }
-
+template <typename HasLimit, typename... TArgs>
+AggregateFunctionPtr create_aggregate_function_collect_impl(const std::string& name,
+                                                            const DataTypePtr& argument_type,
+                                                            TArgs... args) {
     bool distinct = false;
     if (name == "collect_set") {
         distinct = true;
     }
 
-    WhichDataType type(argument_types[0]);
-    if (type.is_uint8()) {
-        return create_agg_function_collect<UInt8>(distinct, argument_types);
-    } else if (type.is_int8()) {
-        return create_agg_function_collect<Int8>(distinct, argument_types);
-    } else if (type.is_int16()) {
-        return create_agg_function_collect<Int16>(distinct, argument_types);
-    } else if (type.is_int32()) {
-        return create_agg_function_collect<Int32>(distinct, argument_types);
-    } else if (type.is_int64()) {
-        return create_agg_function_collect<Int64>(distinct, argument_types);
-    } else if (type.is_int128()) {
-        return create_agg_function_collect<Int128>(distinct, argument_types);
-    } else if (type.is_float32()) {
-        return create_agg_function_collect<Float32>(distinct, argument_types);
-    } else if (type.is_float64()) {
-        return create_agg_function_collect<Float64>(distinct, argument_types);
-    } else if (type.is_decimal32()) {
-        return create_agg_function_collect<Decimal32>(distinct, argument_types);
-    } else if (type.is_decimal64()) {
-        return create_agg_function_collect<Decimal64>(distinct, argument_types);
-    } else if (type.is_decimal128()) {
-        return create_agg_function_collect<Decimal128>(distinct, argument_types);
-    } else if (type.is_decimal128i()) {
-        return create_agg_function_collect<Decimal128I>(distinct, argument_types);
-    } else if (type.is_date()) {
-        return create_agg_function_collect<Int64>(distinct, argument_types);
-    } else if (type.is_date_time()) {
-        return create_agg_function_collect<Int64>(distinct, argument_types);
-    } else if (type.is_date_v2()) {
-        return create_agg_function_collect<UInt32>(distinct, argument_types);
-    } else if (type.is_date_time_v2()) {
-        return create_agg_function_collect<UInt64>(distinct, argument_types);
-    } else if (type.is_string()) {
-        return create_agg_function_collect<StringRef>(distinct, argument_types);
+    WhichDataType which(argument_type);
+#define DISPATCH(TYPE)                                                                 \
+    if (which.idx == TypeIndex::TYPE)                                                  \
+        return do_create_agg_function_collect<TYPE, HasLimit>(distinct, argument_type, \
+                                                              std::forward<TArgs>(args)...);
+    FOR_NUMERIC_TYPES(DISPATCH)
+    FOR_DECIMAL_TYPES(DISPATCH)
+#undef DISPATCH
+    if (which.is_date_or_datetime()) {
+        return do_create_agg_function_collect<Int64, HasLimit>(distinct, argument_type,
+                                                               std::forward<TArgs>(args)...);
+    } else if (which.is_date_v2()) {
+        return do_create_agg_function_collect<UInt32, HasLimit>(distinct, argument_type,
+                                                                std::forward<TArgs>(args)...);
+    } else if (which.is_date_time_v2()) {
+        return do_create_agg_function_collect<UInt64, HasLimit>(distinct, argument_type,
+                                                                std::forward<TArgs>(args)...);
+    } else if (which.is_string()) {
+        return do_create_agg_function_collect<StringRef, HasLimit>(distinct, argument_type,
+                                                                   std::forward<TArgs>(args)...);
     }
 
     LOG(WARNING) << fmt::format("unsupported input type {} for aggregate function {}",
-                                argument_types[0]->get_name(), name);
+                                argument_type->get_name(), name);
+    return nullptr;
+}
+
+AggregateFunctionPtr create_aggregate_function_collect(const std::string& name,
+                                                       const DataTypes& argument_types,
+                                                       const bool result_is_nullable) {
+    if (argument_types.size() == 1) {
+        return create_aggregate_function_collect_impl<std::false_type>(name, argument_types[0],
+                                                                       parameters);
+    }
+    if (argument_types.size() == 2) {
+        return create_aggregate_function_collect_impl<std::true_type>(name, argument_types[0],
+                                                                      parameters);

Review Comment:
   warning: use of undeclared identifier 'parameters' [clang-diagnostic-error]
   ```cpp
                                                                         parameters);
                                                                         ^
   ```
   



##########
be/src/vec/aggregate_functions/aggregate_function_collect.cpp:
##########
@@ -18,78 +18,88 @@
 #include "vec/aggregate_functions/aggregate_function_collect.h"
 
 #include "vec/aggregate_functions/aggregate_function_simple_factory.h"
+#include "vec/aggregate_functions/helpers.h"
 
 namespace doris::vectorized {
 
-template <typename T>
-AggregateFunctionPtr create_agg_function_collect(bool distinct, const DataTypes& argument_types) {
+#define FOR_DECIMAL_TYPES(M) \
+    M(Decimal32)             \
+    M(Decimal64)             \
+    M(Decimal128)            \
+    M(Decimal128I)
+
+template <typename T, typename HasLimit, typename... TArgs>
+AggregateFunctionPtr do_create_agg_function_collect(bool distinct, const DataTypePtr& argument_type,
+                                                    TArgs... args) {
     if (distinct) {
         return AggregateFunctionPtr(
-                new AggregateFunctionCollect<AggregateFunctionCollectSetData<T>>(argument_types));
+                new AggregateFunctionCollect<AggregateFunctionCollectSetData<T, HasLimit>,
+                                             HasLimit>(argument_type,
+                                                       std::forward<TArgs>(args)...));
     } else {
         return AggregateFunctionPtr(
-                new AggregateFunctionCollect<AggregateFunctionCollectListData<T>>(argument_types));
+                new AggregateFunctionCollect<AggregateFunctionCollectListData<T, HasLimit>,
+                                             HasLimit>(argument_type,
+                                                       std::forward<TArgs>(args)...));
     }
 }
 
-AggregateFunctionPtr create_aggregate_function_collect(const std::string& name,
-                                                       const DataTypes& argument_types,
-                                                       const bool result_is_nullable) {
-    if (argument_types.size() != 1) {
-        LOG(WARNING) << fmt::format("Illegal number {} of argument for aggregate function {}",
-                                    argument_types.size(), name);
-        return nullptr;
-    }
-
+template <typename HasLimit, typename... TArgs>
+AggregateFunctionPtr create_aggregate_function_collect_impl(const std::string& name,
+                                                            const DataTypePtr& argument_type,
+                                                            TArgs... args) {
     bool distinct = false;
     if (name == "collect_set") {
         distinct = true;
     }
 
-    WhichDataType type(argument_types[0]);
-    if (type.is_uint8()) {
-        return create_agg_function_collect<UInt8>(distinct, argument_types);
-    } else if (type.is_int8()) {
-        return create_agg_function_collect<Int8>(distinct, argument_types);
-    } else if (type.is_int16()) {
-        return create_agg_function_collect<Int16>(distinct, argument_types);
-    } else if (type.is_int32()) {
-        return create_agg_function_collect<Int32>(distinct, argument_types);
-    } else if (type.is_int64()) {
-        return create_agg_function_collect<Int64>(distinct, argument_types);
-    } else if (type.is_int128()) {
-        return create_agg_function_collect<Int128>(distinct, argument_types);
-    } else if (type.is_float32()) {
-        return create_agg_function_collect<Float32>(distinct, argument_types);
-    } else if (type.is_float64()) {
-        return create_agg_function_collect<Float64>(distinct, argument_types);
-    } else if (type.is_decimal32()) {
-        return create_agg_function_collect<Decimal32>(distinct, argument_types);
-    } else if (type.is_decimal64()) {
-        return create_agg_function_collect<Decimal64>(distinct, argument_types);
-    } else if (type.is_decimal128()) {
-        return create_agg_function_collect<Decimal128>(distinct, argument_types);
-    } else if (type.is_decimal128i()) {
-        return create_agg_function_collect<Decimal128I>(distinct, argument_types);
-    } else if (type.is_date()) {
-        return create_agg_function_collect<Int64>(distinct, argument_types);
-    } else if (type.is_date_time()) {
-        return create_agg_function_collect<Int64>(distinct, argument_types);
-    } else if (type.is_date_v2()) {
-        return create_agg_function_collect<UInt32>(distinct, argument_types);
-    } else if (type.is_date_time_v2()) {
-        return create_agg_function_collect<UInt64>(distinct, argument_types);
-    } else if (type.is_string()) {
-        return create_agg_function_collect<StringRef>(distinct, argument_types);
+    WhichDataType which(argument_type);
+#define DISPATCH(TYPE)                                                                 \
+    if (which.idx == TypeIndex::TYPE)                                                  \
+        return do_create_agg_function_collect<TYPE, HasLimit>(distinct, argument_type, \
+                                                              std::forward<TArgs>(args)...);
+    FOR_NUMERIC_TYPES(DISPATCH)
+    FOR_DECIMAL_TYPES(DISPATCH)
+#undef DISPATCH
+    if (which.is_date_or_datetime()) {
+        return do_create_agg_function_collect<Int64, HasLimit>(distinct, argument_type,
+                                                               std::forward<TArgs>(args)...);
+    } else if (which.is_date_v2()) {
+        return do_create_agg_function_collect<UInt32, HasLimit>(distinct, argument_type,
+                                                                std::forward<TArgs>(args)...);
+    } else if (which.is_date_time_v2()) {
+        return do_create_agg_function_collect<UInt64, HasLimit>(distinct, argument_type,
+                                                                std::forward<TArgs>(args)...);
+    } else if (which.is_string()) {
+        return do_create_agg_function_collect<StringRef, HasLimit>(distinct, argument_type,
+                                                                   std::forward<TArgs>(args)...);
     }
 
     LOG(WARNING) << fmt::format("unsupported input type {} for aggregate function {}",
-                                argument_types[0]->get_name(), name);
+                                argument_type->get_name(), name);
+    return nullptr;
+}
+
+AggregateFunctionPtr create_aggregate_function_collect(const std::string& name,
+                                                       const DataTypes& argument_types,
+                                                       const bool result_is_nullable) {
+    if (argument_types.size() == 1) {
+        return create_aggregate_function_collect_impl<std::false_type>(name, argument_types[0],
+                                                                       parameters);

Review Comment:
   warning: use of undeclared identifier 'parameters' [clang-diagnostic-error]
   ```cpp
                                                                          parameters);
                                                                          ^
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1112116359


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -202,27 +266,36 @@ class AggregateFunctionCollect final
     }
 
     DataTypePtr get_return_type() const override {
-        return std::make_shared<DataTypeArray>(make_nullable(_argument_type));
+        return std::make_shared<DataTypeArray>(make_nullable(return_type));
     }
 
+    bool allocates_memory_in_arena() const override { return ENABLE_ARENA; }
+
     void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num,
              Arena* arena) const override {
-        assert(!columns[0]->is_null_at(row_num));
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).add(*columns[0], row_num, arena);
+        auto& data = this->data(place);
+        if constexpr (HasLimit::value) {
+            data.max_size =
+                    (UInt64) static_cast<const ColumnInt32*>(columns[1])->get_element(row_num);

Review Comment:
   > Could you please specify it further? Thanks a lot.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064237669


##########
be/test/vec/aggregate_functions/agg_group_array_test.cpp:
##########
@@ -0,0 +1,142 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <gtest/gtest.h>
+
+#include "gtest/gtest.h"
+#include "vec/aggregate_functions/aggregate_function_collect.h"
+#include "vec/aggregate_functions/aggregate_function_simple_factory.h"
+#include "vec/common/arena.h"
+#include "vec/core/field.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_date.h"
+#include "vec/data_types/data_type_date_time.h"
+#include "vec/data_types/data_type_decimal.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+
+namespace doris::vectorized {
+
+void register_aggregate_function_group_uniq_array(AggregateFunctionSimpleFactory& factory);
+
+class VAggGroupArrayTest : public testing::Test {
+private:
+    Arena _agg_arena_pool;
+
+public:
+    void SetUp() override {
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        register_aggregate_function_group_uniq_array(factory);
+    }
+
+    void TearDown() override {}
+
+    template <typename DataType>
+    void agg_group_uniq_array_add_elements(AggregateFunctionPtr agg_function,
+                                           AggregateDataPtr place, size_t input_nums) {
+        using FieldType = typename DataType::FieldType;
+        auto type = std::make_shared<DataType>();
+        auto input_col = type->create_column();
+        for (size_t i = 0; i < input_nums; ++i) {
+            if constexpr (std::is_same_v<DataType, DataTypeString>) {
+                auto item = std::string("item") + std::to_string(i);
+                input_col->insert_data(item.c_str(), item.size());
+            } else {
+                auto item = FieldType(static_cast<uint64_t>(i));
+                input_col->insert_data(reinterpret_cast<const char*>(&item), 0);
+            }
+        }
+        EXPECT_EQ(input_col->size(), input_nums);
+
+        const IColumn* column[1] = {input_col.get()};
+        for (int i = 0; i < input_col->size(); i++) {
+            agg_function->add(place, column, i, &_agg_arena_pool);
+        }
+    }
+
+    template <typename DataType>
+    void test_agg_group_uniq_array(size_t input_nums = 0) {
+        DataTypes data_types = {(DataTypePtr)std::make_shared<DataType>()};
+        LOG(INFO) << "test_agg_group_uniq_array for type"
+                  << "(" << data_types[0]->get_name() << ")";
+
+        Array array;
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        auto agg_function = factory.get("group_uniq_array", data_types, array);
+        EXPECT_NE(agg_function, nullptr);
+
+        std::unique_ptr<char[]> memory(new char[agg_function->size_of_data()]);
+        AggregateDataPtr place = memory.get();
+        agg_function->create(place);
+
+        agg_group_uniq_array_add_elements<DataType>(agg_function, place, input_nums);
+
+        ColumnString buf;
+        VectorBufferWriter buf_writer(buf);
+        agg_function->serialize(place, buf_writer);
+        buf_writer.commit();
+        VectorBufferReader buf_reader(buf.get_data_at(0));
+        agg_function->deserialize(place, buf_reader, &_agg_arena_pool);
+
+        std::unique_ptr<char[]> memory2(new char[agg_function->size_of_data()]);
+        AggregateDataPtr place2 = memory2.get();
+        agg_function->create(place2);
+
+        agg_group_uniq_array_add_elements<DataType>(agg_function, place2, input_nums);
+
+        agg_function->merge(place, place2, &_agg_arena_pool);
+        auto column_result = ColumnArray::create(data_types[0]->create_column());
+        agg_function->insert_result_into(place, *column_result);
+        EXPECT_EQ(column_result->size(), 1);
+        EXPECT_EQ(column_result->get_offsets()[0], input_nums);
+
+        auto column_result2 = ColumnArray::create(data_types[0]->create_column());
+        agg_function->insert_result_into(place2, *column_result2);
+        EXPECT_EQ(column_result2->size(), 1);
+        EXPECT_EQ(column_result->get_offsets()[0], input_nums);
+
+        LOG(INFO) << column_result->get_offsets()[0];
+        LOG(INFO) << column_result2->get_offsets()[0];

Review Comment:
   Added for debugging ,I will remove them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-Lian commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
Yukang-Lian commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064167762


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/group_uniq_array.md:
##########
@@ -0,0 +1,69 @@
+---
+{
+    "title": "GROUP_UNIQ_ARRAY",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## GROUP_UNIQ_ARRAY
+### description
+#### Syntax
+
+`ARRAY<T> collect_set(expr[,max_size])`
+
+Creates an array from different argument values,with the optional max_size parameter limits the size of the resulting array to `max_size` elements.
+

Review Comment:
   zh-CN versioon too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1437247396

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1437266948

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1112109843


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -202,27 +266,36 @@ class AggregateFunctionCollect final
     }
 
     DataTypePtr get_return_type() const override {
-        return std::make_shared<DataTypeArray>(make_nullable(_argument_type));
+        return std::make_shared<DataTypeArray>(make_nullable(return_type));
     }
 
+    bool allocates_memory_in_arena() const override { return ENABLE_ARENA; }
+
     void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num,
              Arena* arena) const override {
-        assert(!columns[0]->is_null_at(row_num));
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).add(*columns[0], row_num, arena);
+        auto& data = this->data(place);
+        if constexpr (HasLimit::value) {
+            data.max_size =
+                    (UInt64) static_cast<const ColumnInt32*>(columns[1])->get_element(row_num);

Review Comment:
   Could you please specify it further? Thanks a lot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [feature-wip](aggregate-function) support funtion group_uniq_array

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1090134288


##########
docs/zh-CN/docs/sql-manual/sql-functions/aggregate-functions/group_uniq_array.md:
##########
@@ -0,0 +1,80 @@
+---
+{
+    "title": "GROUP_UNIQ_ARRAY",
+    "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## GROUP_UNIQ_ARRAY
+### description

Review Comment:
   May we talk about it on wechat, I added your wechat from @Yukang-Lian just now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "BiteTheDDDDt (via GitHub)" <gi...@apache.org>.
BiteTheDDDDt commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1447821223

   Hi, Does the case in this pr need to add some `order by` to ensure that the results are always consistent?
   I'm running into something that looks like it's sorting related.
   
   ```
   2023-02-28 16:58:32.363 ERROR [suite-thread-3] (ScriptContext.groovy:121) - Run test_aggregate_collect in /doris/regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_collect.groovy failed
   java.lang.IllegalStateException: Check tag 'select' failed:
   Check tag 'select' failed, line 1, CHAR result mismatch.
   Expect cell is: [1555555555, 3555555555, 9555555555, 255555555, 55555555555]
   But real is: [1555555555, 9555555555, 3555555555, 255555555, 55555555555]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-Lian commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "Yukang-Lian (via GitHub)" <gi...@apache.org>.
Yukang-Lian commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1107966438


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java:
##########
@@ -33,6 +33,7 @@
 import com.google.common.collect.ImmutableSet;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Maps;
+import java_cup.symbol;

Review Comment:
   useless import?



##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -177,21 +243,31 @@ struct AggregateFunctionCollectListData<StringRef> {
 
     void insert_result_into(IColumn& to) const {
         auto& to_str = assert_cast<ColVecType&>(to);
-        to_str.insert_range_from(*data, 0, data->size());
+        to_str.insert_range_from(*data, 0, size());
     }
 };
 
-template <typename Data>
-class AggregateFunctionCollect final
-        : public IAggregateFunctionDataHelper<Data, AggregateFunctionCollect<Data>> {
-public:
-    static constexpr bool alloc_memory_in_arena =
-            std::is_same_v<Data, AggregateFunctionCollectSetData<StringRef>>;
+template <typename Data, typename HasLimit>
+class AggregateFunctionCollect
+        : public IAggregateFunctionDataHelper<Data, AggregateFunctionCollect<Data, HasLimit>> {
+    using GenericType = AggregateFunctionCollectSetData<StringRef>;
+
+    static constexpr bool HAS_LIMIT = HasLimit::value;
+    static constexpr bool ENABLE_ARENA = std::is_same_v<Data, GenericType>;
 
-    AggregateFunctionCollect(const DataTypes& argument_types_)
-            : IAggregateFunctionDataHelper<Data, AggregateFunctionCollect<Data>>(argument_types_,
-                                                                                 {}),
-              _argument_type(argument_types_[0]) {}
+public:
+    AggregateFunctionCollect(const DataTypePtr& argument_type, const Array& parameters_,
+                             UInt64 max_size_ = std::numeric_limits<UInt64>::max())
+            : IAggregateFunctionDataHelper<Data, AggregateFunctionCollect<Data, HasLimit>>(
+                      {argument_type}, parameters_),
+              return_type(argument_type) {}
+
+    AggregateFunctionCollect(const DataTypePtr& argument_type, const Array& parameters_,

Review Comment:
   useless constructor?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1108012802


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -203,27 +279,36 @@ class AggregateFunctionCollect final
     }
 
     DataTypePtr get_return_type() const override {
-        return std::make_shared<DataTypeArray>(make_nullable(_argument_type));
+        return std::make_shared<DataTypeArray>(make_nullable(return_type));
     }
 
+    bool allocates_memory_in_arena() const override { return ENABLE_ARENA; }
+
     void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num,
              Arena* arena) const override {
-        assert(!columns[0]->is_null_at(row_num));
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).add(*columns[0], row_num, arena);
+        auto& data = this->data(place);
+        if constexpr (HAS_LIMIT) {
+            data.max_size =
+                    (UInt64) static_cast<const ColumnInt32*>(columns[1])->get_element(row_num);
+            if (data.size() >= data.max_size) {
+                return;
+            }
+        }
+        if constexpr (ENABLE_ARENA) {
+            data.add(*columns[0], row_num, arena);
         } else {
-            this->data(place).add(*columns[0], row_num);
+            data.add(*columns[0], row_num);
         }
     }
 
-    void reset(AggregateDataPtr place) const override { this->data(place).reset(); }
-
     void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs,
                Arena* arena) const override {
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).merge(this->data(rhs), arena);
+        auto& data = this->data(place);
+        auto& rhs_data = this->data(rhs);
+        if constexpr (ENABLE_ARENA) {
+            data.merge(rhs_data, HAS_LIMIT, arena);

Review Comment:
   It does seem more graceful, I'll try to enhance in this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1108010044


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java:
##########
@@ -33,6 +33,7 @@
 import com.google.common.collect.ImmutableSet;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Maps;
+import java_cup.symbol;

Review Comment:
   Yes, I'll rm it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430984055

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1369512607

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-Lian commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
Yukang-Lian commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064169253


##########
be/test/vec/aggregate_functions/agg_group_array_test.cpp:
##########
@@ -0,0 +1,142 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <gtest/gtest.h>
+
+#include "gtest/gtest.h"
+#include "vec/aggregate_functions/aggregate_function_collect.h"
+#include "vec/aggregate_functions/aggregate_function_simple_factory.h"
+#include "vec/common/arena.h"
+#include "vec/core/field.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_date.h"
+#include "vec/data_types/data_type_date_time.h"
+#include "vec/data_types/data_type_decimal.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+
+namespace doris::vectorized {
+
+void register_aggregate_function_group_uniq_array(AggregateFunctionSimpleFactory& factory);
+
+class VAggGroupArrayTest : public testing::Test {
+private:
+    Arena _agg_arena_pool;
+
+public:
+    void SetUp() override {
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        register_aggregate_function_group_uniq_array(factory);
+    }
+
+    void TearDown() override {}
+
+    template <typename DataType>
+    void agg_group_uniq_array_add_elements(AggregateFunctionPtr agg_function,
+                                           AggregateDataPtr place, size_t input_nums) {
+        using FieldType = typename DataType::FieldType;
+        auto type = std::make_shared<DataType>();
+        auto input_col = type->create_column();
+        for (size_t i = 0; i < input_nums; ++i) {
+            if constexpr (std::is_same_v<DataType, DataTypeString>) {
+                auto item = std::string("item") + std::to_string(i);
+                input_col->insert_data(item.c_str(), item.size());
+            } else {
+                auto item = FieldType(static_cast<uint64_t>(i));
+                input_col->insert_data(reinterpret_cast<const char*>(&item), 0);
+            }
+        }
+        EXPECT_EQ(input_col->size(), input_nums);
+
+        const IColumn* column[1] = {input_col.get()};
+        for (int i = 0; i < input_col->size(); i++) {
+            agg_function->add(place, column, i, &_agg_arena_pool);
+        }
+    }
+
+    template <typename DataType>
+    void test_agg_group_uniq_array(size_t input_nums = 0) {
+        DataTypes data_types = {(DataTypePtr)std::make_shared<DataType>()};
+        LOG(INFO) << "test_agg_group_uniq_array for type"
+                  << "(" << data_types[0]->get_name() << ")";
+
+        Array array;
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        auto agg_function = factory.get("group_uniq_array", data_types, array);
+        EXPECT_NE(agg_function, nullptr);
+
+        std::unique_ptr<char[]> memory(new char[agg_function->size_of_data()]);
+        AggregateDataPtr place = memory.get();
+        agg_function->create(place);
+
+        agg_group_uniq_array_add_elements<DataType>(agg_function, place, input_nums);
+
+        ColumnString buf;
+        VectorBufferWriter buf_writer(buf);
+        agg_function->serialize(place, buf_writer);
+        buf_writer.commit();
+        VectorBufferReader buf_reader(buf.get_data_at(0));
+        agg_function->deserialize(place, buf_reader, &_agg_arena_pool);
+
+        std::unique_ptr<char[]> memory2(new char[agg_function->size_of_data()]);
+        AggregateDataPtr place2 = memory2.get();
+        agg_function->create(place2);
+
+        agg_group_uniq_array_add_elements<DataType>(agg_function, place2, input_nums);
+
+        agg_function->merge(place, place2, &_agg_arena_pool);
+        auto column_result = ColumnArray::create(data_types[0]->create_column());
+        agg_function->insert_result_into(place, *column_result);
+        EXPECT_EQ(column_result->size(), 1);
+        EXPECT_EQ(column_result->get_offsets()[0], input_nums);
+
+        auto column_result2 = ColumnArray::create(data_types[0]->create_column());
+        agg_function->insert_result_into(place2, *column_result2);
+        EXPECT_EQ(column_result2->size(), 1);
+        EXPECT_EQ(column_result->get_offsets()[0], input_nums);
+
+        LOG(INFO) << column_result->get_offsets()[0];
+        LOG(INFO) << column_result2->get_offsets()[0];

Review Comment:
   are these two `LOG` statements useless?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-Lian commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
Yukang-Lian commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064168397


##########
fe/fe-core/src/main/java/org/apache/doris/catalog/FunctionSet.java:
##########
@@ -2609,14 +2634,33 @@ private void initAggregateBuiltins() {
                             .createBuiltin("topn_weighted", Lists.newArrayList(t, Type.BIGINT, Type.INT, Type.INT),
                                     new ArrayType(t), t,
                                     "", "", "", "", "", true, false, true, true));
+
             addBuiltin(AggregateFunction.createBuiltin(HIST, Lists.newArrayList(t), Type.VARCHAR, t,
                     "", "", "", "", "", true, false, true, true));
+
             addBuiltin(AggregateFunction.createBuiltin(HISTOGRAM, Lists.newArrayList(t), Type.VARCHAR, t,
                     "", "", "", "", "", true, false, true, true));
+
             addBuiltin(AggregateFunction.createBuiltin(HIST, Lists.newArrayList(t, Type.DOUBLE, Type.INT), Type.VARCHAR, t,
                                     "", "", "", "", "", true, false, true, true));
-            addBuiltin(AggregateFunction.createBuiltin(HISTOGRAM, Lists.newArrayList(t, Type.DOUBLE, Type.INT), Type.VARCHAR, t,
+
+            addBuiltin(AggregateFunction.createBuiltin(HISTOGRAM, Lists.newArrayList(t, Type.DOUBLE, Type.INT),
+                    Type.VARCHAR, t,
+                    "", "", "", "", "", true, false, true, true));
+
+            addBuiltin(AggregateFunction.createBuiltin(HISTOGRAM, Lists.newArrayList(t, Type.DOUBLE, Type.INT),
+                    Type.VARCHAR, t,
                     "", "", "", "", "", true, false, true, true));
+
+
+            addBuiltin(AggregateFunction.createBuiltin(GROUP_UNIQ_ARRAY, Lists.newArrayList(t), new ArrayType(t), t,
+                    "", "", "", "", "", true, false, true, true));
+
+            addBuiltin(AggregateFunction.createBuiltin(GROUP_UNIQ_ARRAY, Lists.newArrayList(t), new ArrayType(t), t,
+                    "", "", "", "", "", true, false, true, true));
+            addBuiltin(
+                    AggregateFunction.createBuiltin(GROUP_UNIQ_ARRAY, Lists.newArrayList(t, Type.INT), new ArrayType(t),
+                            t, "", "", "", "", "", true, false, true, true));

Review Comment:
   keep this part and cancel other changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1108012802


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -203,27 +279,36 @@ class AggregateFunctionCollect final
     }
 
     DataTypePtr get_return_type() const override {
-        return std::make_shared<DataTypeArray>(make_nullable(_argument_type));
+        return std::make_shared<DataTypeArray>(make_nullable(return_type));
     }
 
+    bool allocates_memory_in_arena() const override { return ENABLE_ARENA; }
+
     void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num,
              Arena* arena) const override {
-        assert(!columns[0]->is_null_at(row_num));
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).add(*columns[0], row_num, arena);
+        auto& data = this->data(place);
+        if constexpr (HAS_LIMIT) {
+            data.max_size =
+                    (UInt64) static_cast<const ColumnInt32*>(columns[1])->get_element(row_num);
+            if (data.size() >= data.max_size) {
+                return;
+            }
+        }
+        if constexpr (ENABLE_ARENA) {
+            data.add(*columns[0], row_num, arena);
         } else {
-            this->data(place).add(*columns[0], row_num);
+            data.add(*columns[0], row_num);
         }
     }
 
-    void reset(AggregateDataPtr place) const override { this->data(place).reset(); }
-
     void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs,
                Arena* arena) const override {
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).merge(this->data(rhs), arena);
+        auto& data = this->data(place);
+        auto& rhs_data = this->data(rhs);
+        if constexpr (ENABLE_ARENA) {
+            data.merge(rhs_data, HAS_LIMIT, arena);

Review Comment:
   It does seem more graceful, I'll try to enhance the code by this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1434004770

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement-wip](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1430955427

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1433111101

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1434021990

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1111497723


##########
docs/en/docs/sql-manual/sql-functions/aggregate-functions/collect_list.md:
##########
@@ -30,9 +30,8 @@ under the License.
 
 `ARRAY<T> collect_list(expr)`
 
-Returns an array consisting of all values in expr within the group.
-The order of elements in the array is non-deterministic. NULL values are excluded.
-
+Returns an array consisting of all values in expr within the group, and ,with the optional `max_size` parameter limits the size of the resulting array to `max_size` elements.The order of elements in the array is non-deterministic. NULL values are excluded.
+It has an alias `group_array`.

Review Comment:
   I understand it now, I'll modify it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1112110316


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -31,122 +33,180 @@
 
 namespace doris::vectorized {
 
-template <typename T>
+template <typename T, typename HasLimit>
 struct AggregateFunctionCollectSetData {
     using ElementType = T;
     using ColVecType = ColumnVectorOrDecimal<ElementType>;
     using ElementNativeType = typename NativeType<T>::Type;
+    using SelfType = AggregateFunctionCollectSetData;
     using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>, 4>;
-    Set set;
+    Set data_set;
+    UInt64 max_size;
+
+    size_t size() const { return data_set.size(); }
 
     void add(const IColumn& column, size_t row_num) {
-        const auto& vec = assert_cast<const ColVecType&>(column).get_data();
-        set.insert(vec[row_num]);
+        data_set.insert(assert_cast<const ColVecType&>(column).get_data()[row_num]);
+    }
+
+    void merge(const SelfType& rhs) {
+        if constexpr (HasLimit::value) {
+            for (auto& rhs_elem : rhs.data_set) {
+                if (size() >= max_size) {
+                    return;
+                }
+                data_set.insert(rhs_elem.get_value());
+            }
+        } else {
+            data_set.merge(rhs.data_set);
+        }
     }
-    void merge(const AggregateFunctionCollectSetData& rhs) { set.merge(rhs.set); }
-    void write(BufferWritable& buf) const { set.write(buf); }
-    void read(BufferReadable& buf) { set.read(buf); }
-    void reset() { set.clear(); }
+
+    void write(BufferWritable& buf) const { data_set.write(buf); }
+
+    void read(BufferReadable& buf) { data_set.read(buf); }
+
     void insert_result_into(IColumn& to) const {
         auto& vec = assert_cast<ColVecType&>(to).get_data();
-        vec.reserve(set.size());
-        for (auto item : set) {
+        vec.reserve(size());
+        for (auto item : data_set) {
             vec.push_back(item.key);
         }
     }
+
+    void reset() { data_set.clear(); }
 };
 
-template <>
-struct AggregateFunctionCollectSetData<StringRef> {
+template <typename HasLimit>
+struct AggregateFunctionCollectSetData<StringRef, HasLimit> {
     using ElementType = StringRef;
     using ColVecType = ColumnString;
+    using SelfType = AggregateFunctionCollectSetData<ElementType, HasLimit>;
     using Set = HashSetWithSavedHashWithStackMemory<ElementType, DefaultHash<ElementType>, 4>;
-    Set set;
+    Set data_set;
+    UInt64 max_size;

Review Comment:
   Ok, I will enhance it in that way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1111555787


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -31,122 +33,180 @@
 
 namespace doris::vectorized {
 
-template <typename T>
+template <typename T, typename HasLimit>
 struct AggregateFunctionCollectSetData {
     using ElementType = T;
     using ColVecType = ColumnVectorOrDecimal<ElementType>;
     using ElementNativeType = typename NativeType<T>::Type;
+    using SelfType = AggregateFunctionCollectSetData;
     using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>, 4>;
-    Set set;
+    Set data_set;
+    UInt64 max_size;
+
+    size_t size() const { return data_set.size(); }
 
     void add(const IColumn& column, size_t row_num) {
-        const auto& vec = assert_cast<const ColVecType&>(column).get_data();
-        set.insert(vec[row_num]);
+        data_set.insert(assert_cast<const ColVecType&>(column).get_data()[row_num]);
+    }
+
+    void merge(const SelfType& rhs) {
+        if constexpr (HasLimit::value) {
+            for (auto& rhs_elem : rhs.data_set) {
+                if (size() >= max_size) {
+                    return;
+                }
+                data_set.insert(rhs_elem.get_value());
+            }
+        } else {
+            data_set.merge(rhs.data_set);
+        }
     }
-    void merge(const AggregateFunctionCollectSetData& rhs) { set.merge(rhs.set); }
-    void write(BufferWritable& buf) const { set.write(buf); }
-    void read(BufferReadable& buf) { set.read(buf); }
-    void reset() { set.clear(); }
+
+    void write(BufferWritable& buf) const { data_set.write(buf); }
+
+    void read(BufferReadable& buf) { data_set.read(buf); }
+
     void insert_result_into(IColumn& to) const {
         auto& vec = assert_cast<ColVecType&>(to).get_data();
-        vec.reserve(set.size());
-        for (auto item : set) {
+        vec.reserve(size());
+        for (auto item : data_set) {
             vec.push_back(item.key);
         }
     }
+
+    void reset() { data_set.clear(); }
 };
 
-template <>
-struct AggregateFunctionCollectSetData<StringRef> {
+template <typename HasLimit>
+struct AggregateFunctionCollectSetData<StringRef, HasLimit> {
     using ElementType = StringRef;
     using ColVecType = ColumnString;
+    using SelfType = AggregateFunctionCollectSetData<ElementType, HasLimit>;
     using Set = HashSetWithSavedHashWithStackMemory<ElementType, DefaultHash<ElementType>, 4>;
-    Set set;
+    Set data_set;
+    UInt64 max_size;

Review Comment:
   better use int64, because doris do not support unsigned bigint. use -1 as not init, only set the value one time



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1063508277


##########
be/test/vec/aggregate_functions/agg_group_array_test.cpp:
##########
@@ -0,0 +1,122 @@
+#include <gtest/gtest.h>
+
+#include "gtest/gtest.h"
+#include "vec/aggregate_functions/aggregate_function_simple_factory.h"
+#include "vec/common/arena.h"
+#include "vec/core/field.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_date.h"
+#include "vec/data_types/data_type_decimal.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/data_types/data_type_date_time.h"
+
+namespace doris::vectorized {
+
+void register_aggregate_function_group_uniq_array(AggregateFunctionSimpleFactory& factory);
+
+class VAggGroupArrayTest : public testing::Test {
+private:
+    Arena _agg_arena_pool;
+
+public:
+    void SetUp() override {
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        register_aggregate_function_group_uniq_array(factory);
+    }
+
+    void TearDown() override {}
+
+    template <typename DataType>
+    void agg_group_uniq_array_add_elements(AggregateFunctionPtr agg_function,
+                                           AggregateDataPtr place, size_t input_nums) {
+        using FieldType = typename DataType::FieldType;
+        auto type = std::make_shared<DataType>();
+        auto input_col = type->create_column();
+        for (size_t i = 0; i < input_nums; ++i) {
+            if constexpr (std::is_same_v<DataType, DataTypeString>) {
+                auto item = std::string("item") + std::to_string(i);
+                input_col->insert_data(item.c_str(), item.size());
+            } else {
+                auto item = FieldType(static_cast<uint64_t>(i));
+                input_col->insert_data(reinterpret_cast<const char*>(&item), 0);
+            }
+        }
+        EXPECT_EQ(input_col->size(), input_nums);
+
+        const IColumn* column[1] = {input_col.get()};
+        for (int i = 0; i < input_col->size(); i++) {
+            agg_function->add(place, column, i, &_agg_arena_pool);
+        }
+    }
+
+    template<typename DataType>
+    void test_agg_group_uniq_array(size_t input_nums = 0){
+        DataTypes data_types = {(DataTypePtr)std::make_shared<DataType>()};
+        LOG(INFO) << "test_agg_group_uniq_array for type"
+                  << "(" << data_types[0]->get_name() << ")";
+
+        Array array;
+        AggregateFunctionSimpleFactory factory = AggregateFunctionSimpleFactory::instance();
+        auto agg_function = factory.get("group_uniq_array", data_types, array);
+        EXPECT_NE(agg_function, nullptr);
+
+        std::unique_ptr<char[]> memory(new char[agg_function->size_of_data()]);
+        AggregateDataPtr place = memory.get();
+        agg_function->create(place);
+
+        agg_group_uniq_array_add_elements<DataType>(agg_function, place, input_nums);
+        
+        ColumnString buf;

Review Comment:
   warning: calling a private constructor of class 'doris::vectorized::ColumnString' [clang-diagnostic-error]
   ```cpp
           ColumnString buf;
                        ^
   ```
   **be/src/vec/columns/column_string.h:75:** declared private here
   ```cpp
       ColumnString() = default;
       ^
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-Lian commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
Yukang-Lian commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064168494


##########
regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_group_array.groovy:
##########
@@ -0,0 +1,261 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_aggregate_group_array"){
+    sql "set enable_vectorized_engine = true"
+
+    def tableName = "group_uniq_array_test"
+    def tableCTAS1 = "group_uniq_array_test_ctas1"
+    def tableCTAS2 = "group_uniq_array_test_ctas2"
+
+    sql "DROP TABLE IF EXISTS ${tableName}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS1}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS2}"
+
+    sql """
+        CREATE TABLE IF NOT EXISTS ${tableName} (
+	        c_id INT,
+            c_bool BOOLEAN,
+            c_tinyint TINYINT,
+            c_smallint SMALLINT,
+            c_int INT,
+            c_bigint BIGINT,
+            c_largeint LARGEINT,
+            c_float FLOAT,
+            c_double DOUBLE,
+            c_decimal DECIMAL(9, 2),
+            c_char CHAR,
+            c_varchar VARCHAR(10),
+            c_string STRING,
+            c_date DATE,
+            c_datev2 DATEV2,
+            c_date_time DATETIME,
+            c_date_timev2 DATETIMEV2(6),
+            c_string_not_null VARCHAR(10) NOT NULL
+	    )
+	    DISTRIBUTED BY HASH(c_int) BUCKETS 1
+	    PROPERTIES (
+	      "replication_num" = "1"
+	    )
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 10, 20, 30, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 11, 21, 33, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, true, 11, 12, 13, 1444444444444, 1555555555, 1.1, 1.222, 13333.33, 'd', 'varchar2', 'string2',
+            '2022-12-02', '2022-12-02', '2022-12-02 22:23:23', '2022-12-02 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 21, 22, 23, 2444444444444, 255555555, 2.1, 2.222, 23333.33, 'f', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, true, 31, 32, 33, 3444444444444, 3555555555, 3.1, 3.222, 33333.33, 'l', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 10, 20, 30, 944444444444, 9555555555, 9.1, 9.222, 93333.33, 'p', 'varchar9', 'string9',
+            '2022-12-09', '2022-12-09', '2022-12-09 22:23:23', '2022-12-09 22:23:24.999999', 'not null')
+    """
+
+    qt_select """
+        SELECT
+            group_uniq_array(c_bool),
+            group_uniq_array(c_tinyint),
+            group_uniq_array(c_smallint),
+            group_uniq_array(c_int),
+            group_uniq_array(c_bigint),
+            group_uniq_array(c_largeint),
+            group_uniq_array(c_float),
+            group_uniq_array(c_double),
+            group_uniq_array(c_decimal),
+            group_uniq_array(c_char),
+            group_uniq_array(c_varchar),
+            group_uniq_array(c_string),
+            group_uniq_array(c_date),
+            group_uniq_array(c_datev2),
+            group_uniq_array(c_date_time),
+            group_uniq_array(c_date_timev2),
+            group_uniq_array(c_string_not_null)
+        FROM
+            ${tableName}
+    """
+
+    qt_select """
+        SELECT
+            group_uniq_array(c_bool,1),
+            group_uniq_array(c_tinyint,1),
+            group_uniq_array(c_smallint,1),
+            group_uniq_array(c_int,1),
+            group_uniq_array(c_bigint,1),
+            group_uniq_array(c_largeint,1),
+            group_uniq_array(c_float,1),
+            group_uniq_array(c_double,1),
+            group_uniq_array(c_decimal,1),
+            group_uniq_array(c_char,1),
+            group_uniq_array(c_varchar,1),
+            group_uniq_array(c_string,1),
+            group_uniq_array(c_date,1),
+            group_uniq_array(c_datev2,1),
+            group_uniq_array(c_date_time,1),
+            group_uniq_array(c_date_timev2,1),
+            group_uniq_array(c_string_not_null,1)
+        FROM
+            ${tableName}
+    """
+
+    qt_select """
+        SELECT
+            group_uniq_array(c_bool),
+            group_uniq_array(c_tinyint),
+            group_uniq_array(c_smallint),
+            group_uniq_array(c_int),
+            group_uniq_array(c_bigint),
+            group_uniq_array(c_largeint),
+            group_uniq_array(c_float),
+            group_uniq_array(c_double),
+            group_uniq_array(c_decimal),
+            group_uniq_array(c_char),
+            group_uniq_array(c_varchar),
+            group_uniq_array(c_string),
+            group_uniq_array(c_date),
+            group_uniq_array(c_datev2),
+            group_uniq_array(c_date_time),
+            group_uniq_array(c_date_timev2),
+            group_uniq_array(c_string_not_null)
+        FROM
+            ${tableName}
+        GROUP BY
+            c_id
+        ORDER BY
+            c_id
+    """
+
+    qt_select """
+        SELECT
+            group_uniq_array(c_bool,1),
+            group_uniq_array(c_tinyint,1),
+            group_uniq_array(c_smallint,1),
+            group_uniq_array(c_int,1),
+            group_uniq_array(c_bigint,1),
+            group_uniq_array(c_largeint,1),
+            group_uniq_array(c_float,1),
+            group_uniq_array(c_double,1),
+            group_uniq_array(c_decimal,1),
+            group_uniq_array(c_char,1),
+            group_uniq_array(c_varchar,1),
+            group_uniq_array(c_string,1),
+            group_uniq_array(c_date,1),
+            group_uniq_array(c_datev2,1),
+            group_uniq_array(c_date_time,1),
+            group_uniq_array(c_date_timev2,1),
+            group_uniq_array(c_string_not_null,1)
+        FROM
+            ${tableName}
+        GROUP BY
+            c_id
+        ORDER BY
+            c_id
+    """
+
+    sql """

Review Comment:
   why we need this test and the below one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Yukang-Lian commented on a diff in pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
Yukang-Lian commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1064168746


##########
regression-test/suites/query_p0/sql_functions/aggregate_functions/test_aggregate_group_array.groovy:
##########
@@ -0,0 +1,261 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_aggregate_group_array"){
+    sql "set enable_vectorized_engine = true"
+
+    def tableName = "group_uniq_array_test"
+    def tableCTAS1 = "group_uniq_array_test_ctas1"
+    def tableCTAS2 = "group_uniq_array_test_ctas2"
+
+    sql "DROP TABLE IF EXISTS ${tableName}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS1}"
+    sql "DROP TABLE IF EXISTS ${tableCTAS2}"
+
+    sql """
+        CREATE TABLE IF NOT EXISTS ${tableName} (
+	        c_id INT,
+            c_bool BOOLEAN,
+            c_tinyint TINYINT,
+            c_smallint SMALLINT,
+            c_int INT,
+            c_bigint BIGINT,
+            c_largeint LARGEINT,
+            c_float FLOAT,
+            c_double DOUBLE,
+            c_decimal DECIMAL(9, 2),
+            c_char CHAR,
+            c_varchar VARCHAR(10),
+            c_string STRING,
+            c_date DATE,
+            c_datev2 DATEV2,
+            c_date_time DATETIME,
+            c_date_timev2 DATETIMEV2(6),
+            c_string_not_null VARCHAR(10) NOT NULL
+	    )
+	    DISTRIBUTED BY HASH(c_int) BUCKETS 1
+	    PROPERTIES (
+	      "replication_num" = "1"
+	    )
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 10, 20, 30, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, false, 11, 21, 33, 4444444444444, 55555555555, 0.1, 0.222, 3333.33, 'c', 'varchar1', 'string1',
+            '2022-12-01', '2022-12-01', '2022-12-01 22:23:23', '2022-12-01 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (1, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (1, true, 11, 12, 13, 1444444444444, 1555555555, 1.1, 1.222, 13333.33, 'd', 'varchar2', 'string2',
+            '2022-12-02', '2022-12-02', '2022-12-02 22:23:23', '2022-12-02 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 21, 22, 23, 2444444444444, 255555555, 2.1, 2.222, 23333.33, 'f', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, true, 31, 32, 33, 3444444444444, 3555555555, 3.1, 3.222, 33333.33, 'l', 'varchar3', 'string3',
+            '2022-12-03', '2022-12-03', '2022-12-03 22:23:23', '2022-12-03 22:23:24.999999', 'not null')
+    """
+
+    sql """
+        INSERT INTO ${tableName} values
+            (2, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL, 'not null'),
+            (2, false, 10, 20, 30, 944444444444, 9555555555, 9.1, 9.222, 93333.33, 'p', 'varchar9', 'string9',
+            '2022-12-09', '2022-12-09', '2022-12-09 22:23:23', '2022-12-09 22:23:24.999999', 'not null')
+    """
+

Review Comment:
   I think it is more clear to execute `select * from ${tableName}` query after all `insert`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1368432413

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 35 seconds
    load time: 639 seconds
    storage size: 17123141162 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230101122524_clickbench_pr_72329.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [Feature](aggregate-function) support funtion group_uniq_array

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1369522160

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1112107754


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -31,122 +33,180 @@
 
 namespace doris::vectorized {
 
-template <typename T>
+template <typename T, typename HasLimit>
 struct AggregateFunctionCollectSetData {
     using ElementType = T;
     using ColVecType = ColumnVectorOrDecimal<ElementType>;
     using ElementNativeType = typename NativeType<T>::Type;
+    using SelfType = AggregateFunctionCollectSetData;
     using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>, 4>;
-    Set set;
+    Set data_set;
+    UInt64 max_size;
+
+    size_t size() const { return data_set.size(); }
 
     void add(const IColumn& column, size_t row_num) {
-        const auto& vec = assert_cast<const ColVecType&>(column).get_data();
-        set.insert(vec[row_num]);
+        data_set.insert(assert_cast<const ColVecType&>(column).get_data()[row_num]);
+    }
+
+    void merge(const SelfType& rhs) {
+        if constexpr (HasLimit::value) {
+            for (auto& rhs_elem : rhs.data_set) {
+                if (size() >= max_size) {
+                    return;
+                }
+                data_set.insert(rhs_elem.get_value());
+            }
+        } else {
+            data_set.merge(rhs.data_set);
+        }
     }
-    void merge(const AggregateFunctionCollectSetData& rhs) { set.merge(rhs.set); }
-    void write(BufferWritable& buf) const { set.write(buf); }
-    void read(BufferReadable& buf) { set.read(buf); }
-    void reset() { set.clear(); }
+
+    void write(BufferWritable& buf) const { data_set.write(buf); }
+
+    void read(BufferReadable& buf) { data_set.read(buf); }
+
     void insert_result_into(IColumn& to) const {
         auto& vec = assert_cast<ColVecType&>(to).get_data();
-        vec.reserve(set.size());
-        for (auto item : set) {
+        vec.reserve(size());
+        for (auto item : data_set) {

Review Comment:
   Recieved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1437242528

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1111551936


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -202,27 +266,36 @@ class AggregateFunctionCollect final
     }
 
     DataTypePtr get_return_type() const override {
-        return std::make_shared<DataTypeArray>(make_nullable(_argument_type));
+        return std::make_shared<DataTypeArray>(make_nullable(return_type));
     }
 
+    bool allocates_memory_in_arena() const override { return ENABLE_ARENA; }
+
     void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num,
              Arena* arena) const override {
-        assert(!columns[0]->is_null_at(row_num));
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).add(*columns[0], row_num, arena);
+        auto& data = this->data(place);
+        if constexpr (HasLimit::value) {
+            data.max_size =
+                    (UInt64) static_cast<const ColumnInt32*>(columns[1])->get_element(row_num);

Review Comment:
   the column seems must be a const column?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1112109843


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -202,27 +266,36 @@ class AggregateFunctionCollect final
     }
 
     DataTypePtr get_return_type() const override {
-        return std::make_shared<DataTypeArray>(make_nullable(_argument_type));
+        return std::make_shared<DataTypeArray>(make_nullable(return_type));
     }
 
+    bool allocates_memory_in_arena() const override { return ENABLE_ARENA; }
+
     void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num,
              Arena* arena) const override {
-        assert(!columns[0]->is_null_at(row_num));
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).add(*columns[0], row_num, arena);
+        auto& data = this->data(place);
+        if constexpr (HasLimit::value) {
+            data.max_size =
+                    (UInt64) static_cast<const ColumnInt32*>(columns[1])->get_element(row_num);

Review Comment:
   May you specify it please?  Thanks a lot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1109628442


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -49,18 +49,16 @@ struct AggregateFunctionCollectSetData {
         data_set.insert(assert_cast<const ColVecType&>(column).get_data()[row_num]);
     }
 
-    void merge(const SelfType& rhs) { data_set.merge(rhs.data_set); }
-
-    void merge(const SelfType& rhs, bool has_limit) {
-        if (!has_limit) {
-            merge(rhs);
-            return;
-        }
-        for (auto& rhs_elem : rhs.data_set) {
-            if (size() >= max_size) {
-                return;
+    void merge(const SelfType& rhs) {
+        if constexpr (HasLimit::value) {
+            data_set.merge(rhs.data_set);
+        } else {

Review Comment:
   OK, I'll check it out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1434607988

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1439925342

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] TangSiyang2001 commented on pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "TangSiyang2001 (via GitHub)" <gi...@apache.org>.
TangSiyang2001 commented on PR #15339:
URL: https://github.com/apache/doris/pull/15339#issuecomment-1439922337

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zhangstar333 commented on a diff in pull request #15339: [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases

Posted by "zhangstar333 (via GitHub)" <gi...@apache.org>.
zhangstar333 commented on code in PR #15339:
URL: https://github.com/apache/doris/pull/15339#discussion_r1107976938


##########
be/src/vec/aggregate_functions/aggregate_function_collect.h:
##########
@@ -203,27 +279,36 @@ class AggregateFunctionCollect final
     }
 
     DataTypePtr get_return_type() const override {
-        return std::make_shared<DataTypeArray>(make_nullable(_argument_type));
+        return std::make_shared<DataTypeArray>(make_nullable(return_type));
     }
 
+    bool allocates_memory_in_arena() const override { return ENABLE_ARENA; }
+
     void add(AggregateDataPtr __restrict place, const IColumn** columns, size_t row_num,
              Arena* arena) const override {
-        assert(!columns[0]->is_null_at(row_num));
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).add(*columns[0], row_num, arena);
+        auto& data = this->data(place);
+        if constexpr (HAS_LIMIT) {
+            data.max_size =
+                    (UInt64) static_cast<const ColumnInt32*>(columns[1])->get_element(row_num);
+            if (data.size() >= data.max_size) {
+                return;
+            }
+        }
+        if constexpr (ENABLE_ARENA) {
+            data.add(*columns[0], row_num, arena);
         } else {
-            this->data(place).add(*columns[0], row_num);
+            data.add(*columns[0], row_num);
         }
     }
 
-    void reset(AggregateDataPtr place) const override { this->data(place).reset(); }
-
     void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs,
                Arena* arena) const override {
-        if constexpr (alloc_memory_in_arena) {
-            this->data(place).merge(this->data(rhs), arena);
+        auto& data = this->data(place);
+        auto& rhs_data = this->data(rhs);
+        if constexpr (ENABLE_ARENA) {
+            data.merge(rhs_data, HAS_LIMIT, arena);

Review Comment:
   whether the `HAS_LIMIT` pass to  dataSet a template, so doing merge work like
   `if constexpr has_limit {
    }.....` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org