You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/07/09 12:09:33 UTC

[GitHub] [doris] carlvinhust2012 opened a new pull request, #10388: [feature-wip] (array-type) add the array_distinct function

carlvinhust2012 opened a new pull request, #10388:
URL: https://github.com/apache/doris/pull/10388

   # Proposed changes
   Issue Number:close #10052 
   1. add the  array_distinct function for array-type which follow spark syntax;
   2. we can run this function in sql as follow:
   
   MySQL [(none)]> set enable_array_type=true;
   
   MySQL [(none)]> set enable_vectorized_engine=true;
   
   MySQL [(none)]> select id, c_array, array_distinct(c_array) from example_db.array_test;
   +------+-----------------------------+---------------------------+
   | id   | c_array                     | array_distinct(`c_array`) |
   +------+-----------------------------+---------------------------+
   |    1 | [1, 2, 3, 4, 5]             | [1, 2, 3, 4, 5]           |
   |    2 | [6, 7, 8]                   | [6, 7, 8]                 |
   |    3 | []                          | []                        |
   |    4 | NULL                        | NULL                      |
   |    5 | [1, 2, 3, 4, 5, 4, 3, 2, 1] | [1, 2, 3, 4, 5]           |
   |    6 | [1, 2, 3, NULL]             | [1, 2, 3, NULL]           |
   |    7 | [1, 2, 3, NULL, NULL]       | [1, 2, 3, NULL, NULL]     |
   +------+-----------------------------+---------------------------+
   7 rows in set (0.005 sec)
   
   MySQL [(none)]> select id, c_array, array_distinct(c_array) from example_db.array_test01;
   +------+--------------------------+---------------------------+
   | id   | c_array                  | array_distinct(`c_array`) |
   +------+--------------------------+---------------------------+
   |    1 | [a, b, c, d, e]          | [a, b, c, d, e]           |
   |    2 | [f, g, h]                | [f, g, h]                 |
   |    3 | []                       | []                        |
   |    3 | [NULL]                   | [NULL]                    |
   |    5 | [a, b, c, d, e, a, b, c] | [a, b, c, d, e]           |
   |    6 | NULL                     | NULL                      |
   |    7 | [a, b, NULL]             | [a, b, NULL]              |
   |    7 | [a, b, NULL, NULL]       | [a, b, NULL, NULL]        |
   +------+--------------------------+---------------------------+
   8 rows in set (0.009 sec)
   
   3. the 'BufferWritable'  was defined in 'vec/common/string_buffer.hpp', so  'io_helper.h' should include this header file.
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (No)
   5. Has unit tests been added: (No)
   6. Has document been added or modified: (No)
   7. Does it need to update dependencies: (No)
   8. Are there any changes that cannot be rolled back: (No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r913592111


##########
docs/en/docs/sql-manual/sql-functions/array-functions/array_distinct.md:
##########
@@ -0,0 +1,79 @@
+---
+{
+    "title": "array_distinct",
+    "language": "en"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## array_distinct
+
+### description
+
+#### Syntax
+
+```
+ARRAY<T> array_distinct(ARRAY<T> arr)
+```
+
+Return the array which has been removed duplicate values.
+Return NULL for NULL input.
+
+### notice
+
+`Only supported in vectorized engine`
+
+### example
+
+```
+mysql> set enable_vectorized_engine=true;
+
+mysql> select k1, k2, array_distinct(k2) from array_test;
++------+-----------------------------+---------------------------+
+| k1   | k2                          | array_distinct(k2)        |
++------+-----------------------------+---------------------------+
+| 1    | [1, 2, 3, 4, 5]             | [1, 2, 3, 4, 5]           |
+| 2    | [6, 7, 8]                   | [6, 7, 8]                 |
+| 3    | []                          | []                        |
+| 4    | NULL                        | NULL                      |
+| 5    | [1, 2, 3, 4, 5, 4, 3, 2, 1] | [1, 2, 3, 4, 5]           |
+| 6    | [1, 2, 3, NULL]             | [1, 2, 3, NULL]           |
+| 7    | [1, 2, 3, NULL, NULL]       | [1, 2, 3, NULL, NULL]     |
++------+-----------------------------+---------------------------+
+
+mysql> select k1, k2, array_distinct(k2) from array_test01;
++------+--------------------------+---------------------------+
+| k1   | k2                       | array_distinct(k2)        |
++------+--------------------------+---------------------------+
+| 1    | [a, b, c, d, e]          | [a, b, c, d, e]           |
+| 2    | [f, g, h]                | [f, g, h]                 |
+| 3    | []                       | []                        |
+| 4    | [NULL]                   | [NULL]                    |
+| 5    | [a, b, c, d, e, a, b, c] | [a, b, c, d, e]           |
+| 6    | NULL                     | NULL                      |
+| 7    | [a, b, NULL]             | [a, b, NULL]              |
+| 8    | [a, b, NULL, NULL]       | [a, b, NULL, NULL]        |
++------+--------------------------+---------------------------+
+```
+
+### keywords
+
+ARRAY, DISTINCT

Review Comment:
   ARRAY_DISTINCT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r913589670


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,268 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = block.get_by_position(arguments[0]).type;
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const NullMapType* src_null_map = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+            src_null_map = &src_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        NullMapType* dest_null_map = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+            dest_null_map = &dest_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val = _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column,
+                                        dest_offsets, src_null_map, dest_null_map, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        using NestType = typename ColumnType::value_type;
+        using ElementNativeType = typename NativeType<NestType>::Type;
+
+        const ColumnType* src_data_concrete = check_and_get_column<ColumnType>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<NestType>& src_datas = src_data_concrete->get_data();
+
+        ColumnType& dest_data_concrete = reinterpret_cast<ColumnType&>(dest_column);
+        PaddedPODArray<NestType>& dest_datas = dest_data_concrete.get_data();
+
+        using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>,
+                                           INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(true);
+                        // Note: here we need to add an element which will not use for output
+                        // because we expand the value of each offset
+                        dest_datas.push_back(NestType());
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                if (!set.find(src_datas[j])) {
+                    set.insert(src_datas[j]);
+                    dest_datas.push_back(src_datas[j]);
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+
+        return true;
+    }
+
+    bool _execute_string(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        const ColumnString* src_data_concrete = check_and_get_column<ColumnString>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+
+        ColumnString& dest_column_string = reinterpret_cast<ColumnString&>(dest_column);
+        ColumnString::Chars& column_string_chars = dest_column_string.get_chars();
+        ColumnString::Offsets& column_string_offsets = dest_column_string.get_offsets();
+        column_string_chars.reserve(src_column.size());
+        column_string_offsets.reserve(src_offsets.size());
+
+        using Set = HashSetWithStackMemory<StringRef, DefaultHash<StringRef>, INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {
+                    if (dest_null_map) {
+                        // Note: here we need to update the offset of ColumnString
+                        column_string_offsets.push_back(column_string_offsets.back());
+                        (*dest_null_map).push_back(true);
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                StringRef src_str_ref = src_data_concrete->get_data_at(j);
+                if (!set.find(src_str_ref)) {
+                    set.insert(src_str_ref);
+                    dest_column_string.insert_data(src_str_ref.data, src_str_ref.size);

Review Comment:
   string_offsets.push_back();
   resize + memcpy;



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
eldenmoon commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r906845462


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   sorry, it's still not working for me, errors still occur if you adding array function in `where` condition. eg. 
   ```
   mysql> create table test_array_string (k1 INT, k2 INT, k3 array<string>) ENGINE=olap DUPLICATE KEY(k1, k2) PARTITION BY RANGE (k1) (partition `p1` values less than ("1000"), partition `p2` values less than ("2000"),partition `p3` values less than ("3000"))  DISTRIBUTED BY HASH(k2) BUCKETS 3 PROPERTIES("replication_num" = "1");
   Query OK, 0 rows affected (0.02 sec)
   
   mysql> insert into test_array_string  values(1, 2, ["a", "b", "c"]),(1, 2, ["a", "b"]),(1, 2, ["a", "xxqwdqw", "c"]),(1, 2, ["a", "b"]),(1, 2, ["a", "b", "c"]),(1, 2, ["a", "b", "c"]),(1, 2, ["a", "b", "c"]),(1, 2, ["a", "b"]),(1, 2, ["a", "b", "c", "d", "e"]),(1, 2, ["a", "b", "c"]), (1, 2, ["a", "b", "cdsdasd"]),(1, 2, [ "nishidabenzhu123"]),(1, 2, ["a", "b", "c"]),(1, 2, ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]), (1, 2, ["a", "cdsdasd"]), (1, 2, ["cdsdasd", "lihngyu2", "3", "4", "5"]);
   Query OK, 16 rows affected (0.04 sec)
   {'label':'insert_c3ab610af58b4b3a-83d3cc32cf9f1709', 'status':'VISIBLE', 'txnId':'4'}
   
   mysql> set enable_vectorized_engine = true;
   Query OK, 0 rows affected (0.01 sec)
   
   mysql> set enable_array_type = true;
   Query OK, 0 rows affected (0.00 sec)
   
   mysql> select size(k3) from test_array_string where size(k3) != 3;
   **ERROR 1105 (HY000): errCode = 2, detailMessage = Function size is not implemented.**
   
   mysql> select size(k3) from test_array_string;
   +------------+
   | size(`k3`) |
   +------------+
   |          5 |
   |          2 |
   |         10 |
   |          3 |
   |          1 |
   |          3 |
   |          3 |
   |          5 |
   |          2 |
   |          3 |
   |          3 |
   |          3 |
   |          2 |
   |          3 |
   |          2 |
   |          3 |
   +------------+
   16 rows in set (0.01 sec)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r906978667


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   Got it, I will try to reproduce and fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905768340


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";

Review Comment:
   print arguments[0] while DCHECK failed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905782103


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nullable_column =
+                check_and_get_column<ColumnNullable>(src_column_data);
+        if (src_nullable_column) {
+            src_inner_column = src_nullable_column->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_nested_column = nullptr;
+        ColumnNullable* dest_nullable_column = reinterpret_cast<ColumnNullable*>(&dest_column);
+        if (dest_nullable_column) {
+            dest_nested_column = dest_nullable_column->get_nested_column_ptr();
+        } else {
+            dest_nested_column = &dest_column;
+        }
+
+        auto res_val =
+                _execute_by_type(*src_inner_column, src_offsets, *dest_nested_column, dest_offsets,
+                                 src_nullable_column, dest_nullable_column, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename T>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnVector<T>* src_data_concrete =
+                check_and_get_column<ColumnVector<T>>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<T>& src_datas = src_data_concrete->get_data();
+
+        ColumnVector<T>& dest_data_concrete = reinterpret_cast<ColumnVector<T>&>(dest_column);
+        PaddedPODArray<T>& dest_datas = dest_data_concrete.get_data();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        // using Set = HashSetWithSavedHashWithStackMemory<T, DefaultHash<T>, INITIAL_SIZE_DEGREE>;
+        using Set = HashSetWithSavedHashWithStackMemory<UInt128, UInt128TrivialHash,
+                                                        INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if (src_nullable_col && (*src_null_map)[j]) {
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(true);
+                        // Note: here we need to add an element which will not use for output
+                        // because we expand the value of each offset
+                        dest_datas.push_back(-1);
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                UInt128 hash;
+                SipHash hash_function;
+                src_column.update_hash_with_value(j, hash_function);
+                hash_function.get128(reinterpret_cast<char*>(&hash));
+
+                if (!set.find(hash)) {
+                    set.insert(hash);
+                    dest_datas.push_back(src_datas[j]);
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+
+        return true;
+    }
+
+    bool _execute_string(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnString* src_data_concrete = check_and_get_column<ColumnString>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+
+        ColumnString& dest_column_string = reinterpret_cast<ColumnString&>(dest_column);
+        ColumnString::Offsets& column_string_offsets = dest_column_string.get_offsets();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        using Set =
+                HashSetWithSavedHashWithStackMemory<StringRef, StringRefHash, INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if (src_nullable_col && (*src_null_map)[j]) {
+                    if (dest_nullable_col && dest_null_map) {
+                        column_string_offsets.push_back(column_string_offsets.back());
+                        (*dest_null_map).push_back(true);
+                        null_size++;
+                    }
+                    continue;
+                }
+                StringRef src_str_ref = src_data_concrete->get_data_at(j);
+
+                if (!set.find(src_str_ref)) {
+                    set.insert(src_str_ref);
+                    dest_column_string.insert_data(src_str_ref.data, src_str_ref.size);
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+        return true;
+    }
+
+    bool _execute_by_type(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                          IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                          const ColumnNullable* src_nullable_col, ColumnNullable* dest_nullable_col,
+                          DataTypePtr& nested_type) {
+        bool res = false;
+        WhichDataType which(remove_nullable(nested_type)->get_type_id());
+        if (which.idx == TypeIndex::UInt8) {
+            res = _execute_number<UInt8>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt16) {

Review Comment:
   > we do not support Unsigned data type, except UInt8 for Boolean.
   
   ok, remove it later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905768743


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(

Review Comment:
   output datatype should be the same as input



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
eldenmoon commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905863574


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   why not add symbol to functions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r913582421


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,268 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = block.get_by_position(arguments[0]).type;
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const NullMapType* src_null_map = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+            src_null_map = &src_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        NullMapType* dest_null_map = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+            dest_null_map = &dest_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val = _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column,
+                                        dest_offsets, src_null_map, dest_null_map, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        using NestType = typename ColumnType::value_type;
+        using ElementNativeType = typename NativeType<NestType>::Type;
+
+        const ColumnType* src_data_concrete = check_and_get_column<ColumnType>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<NestType>& src_datas = src_data_concrete->get_data();
+
+        ColumnType& dest_data_concrete = reinterpret_cast<ColumnType&>(dest_column);
+        PaddedPODArray<NestType>& dest_datas = dest_data_concrete.get_data();
+
+        using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>,
+                                           INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(true);
+                        // Note: here we need to add an element which will not use for output
+                        // because we expand the value of each offset
+                        dest_datas.push_back(NestType());
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                if (!set.find(src_datas[j])) {
+                    set.insert(src_datas[j]);
+                    dest_datas.push_back(src_datas[j]);
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+
+        return true;
+    }
+
+    bool _execute_string(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        const ColumnString* src_data_concrete = check_and_get_column<ColumnString>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+
+        ColumnString& dest_column_string = reinterpret_cast<ColumnString&>(dest_column);
+        ColumnString::Chars& column_string_chars = dest_column_string.get_chars();
+        ColumnString::Offsets& column_string_offsets = dest_column_string.get_offsets();
+        column_string_chars.reserve(src_column.size());
+        column_string_offsets.reserve(src_offsets.size());

Review Comment:
   already reserve ColumnString



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xy720 merged pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
xy720 merged PR #10388:
URL: https://github.com/apache/doris/pull/10388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905907255


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   Array functions only supported on vec engine. We do not need to add them in vec engine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
eldenmoon commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905977203


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   But the sql may fail if not add symbol, you could try this, here's my issue i encounterd



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r911789159


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,297 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val =
+                _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column, dest_offsets,
+                                 src_nested_nullable_col, dest_nested_nullable_col, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType, typename NestType>

Review Comment:
   ColumnType or NestType, only one is needed:
   Input ColumnType, using NestType = ColumnType::value_type
   Input NestType, using ColumnType = ColumnVector<NestType>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r911792467


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,297 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val =
+                _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column, dest_offsets,
+                                 src_nested_nullable_col, dest_nested_nullable_col, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType, typename NestType>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnType* src_data_concrete = check_and_get_column<ColumnType>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<NestType>& src_datas = src_data_concrete->get_data();
+
+        ColumnType& dest_data_concrete = reinterpret_cast<ColumnType&>(dest_column);
+        PaddedPODArray<NestType>& dest_datas = dest_data_concrete.get_data();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        using Set = HashSetWithSavedHashWithStackMemory<UInt128, UInt128TrivialHash,

Review Comment:
   UInt128 for all type?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r913574823


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,268 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = block.get_by_position(arguments[0]).type;
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const NullMapType* src_null_map = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+            src_null_map = &src_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        NullMapType* dest_null_map = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+            dest_null_map = &dest_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val = _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column,
+                                        dest_offsets, src_null_map, dest_null_map, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        using NestType = typename ColumnType::value_type;
+        using ElementNativeType = typename NativeType<NestType>::Type;
+
+        const ColumnType* src_data_concrete = check_and_get_column<ColumnType>(&src_column);

Review Comment:
   use reinterpret_cast



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905782619


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nullable_column =
+                check_and_get_column<ColumnNullable>(src_column_data);
+        if (src_nullable_column) {
+            src_inner_column = src_nullable_column->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_nested_column = nullptr;
+        ColumnNullable* dest_nullable_column = reinterpret_cast<ColumnNullable*>(&dest_column);
+        if (dest_nullable_column) {
+            dest_nested_column = dest_nullable_column->get_nested_column_ptr();
+        } else {
+            dest_nested_column = &dest_column;
+        }
+
+        auto res_val =
+                _execute_by_type(*src_inner_column, src_offsets, *dest_nested_column, dest_offsets,
+                                 src_nullable_column, dest_nullable_column, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename T>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnVector<T>* src_data_concrete =
+                check_and_get_column<ColumnVector<T>>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<T>& src_datas = src_data_concrete->get_data();
+
+        ColumnVector<T>& dest_data_concrete = reinterpret_cast<ColumnVector<T>&>(dest_column);
+        PaddedPODArray<T>& dest_datas = dest_data_concrete.get_data();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        // using Set = HashSetWithSavedHashWithStackMemory<T, DefaultHash<T>, INITIAL_SIZE_DEGREE>;
+        using Set = HashSetWithSavedHashWithStackMemory<UInt128, UInt128TrivialHash,
+                                                        INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if (src_nullable_col && (*src_null_map)[j]) {
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(true);
+                        // Note: here we need to add an element which will not use for output
+                        // because we expand the value of each offset
+                        dest_datas.push_back(-1);
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                UInt128 hash;
+                SipHash hash_function;
+                src_column.update_hash_with_value(j, hash_function);
+                hash_function.get128(reinterpret_cast<char*>(&hash));
+
+                if (!set.find(hash)) {
+                    set.insert(hash);
+                    dest_datas.push_back(src_datas[j]);
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+
+        return true;
+    }
+
+    bool _execute_string(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnString* src_data_concrete = check_and_get_column<ColumnString>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+
+        ColumnString& dest_column_string = reinterpret_cast<ColumnString&>(dest_column);
+        ColumnString::Offsets& column_string_offsets = dest_column_string.get_offsets();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        using Set =
+                HashSetWithSavedHashWithStackMemory<StringRef, StringRefHash, INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if (src_nullable_col && (*src_null_map)[j]) {
+                    if (dest_nullable_col && dest_null_map) {
+                        column_string_offsets.push_back(column_string_offsets.back());
+                        (*dest_null_map).push_back(true);
+                        null_size++;
+                    }
+                    continue;
+                }
+                StringRef src_str_ref = src_data_concrete->get_data_at(j);
+
+                if (!set.find(src_str_ref)) {
+                    set.insert(src_str_ref);
+                    dest_column_string.insert_data(src_str_ref.data, src_str_ref.size);
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+        return true;
+    }
+
+    bool _execute_by_type(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                          IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                          const ColumnNullable* src_nullable_col, ColumnNullable* dest_nullable_col,
+                          DataTypePtr& nested_type) {
+        bool res = false;
+        WhichDataType which(remove_nullable(nested_type)->get_type_id());
+        if (which.idx == TypeIndex::UInt8) {
+            res = _execute_number<UInt8>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt16) {
+            res = _execute_number<UInt16>(src_column, src_offsets, dest_column, dest_offsets,
+                                          src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt32) {
+            res = _execute_number<UInt32>(src_column, src_offsets, dest_column, dest_offsets,
+                                          src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt64) {
+            res = _execute_number<UInt64>(src_column, src_offsets, dest_column, dest_offsets,
+                                          src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt128) {
+            res = _execute_number<UInt128>(src_column, src_offsets, dest_column, dest_offsets,
+                                           src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int8) {
+            res = _execute_number<Int8>(src_column, src_offsets, dest_column, dest_offsets,
+                                        src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int16) {
+            res = _execute_number<Int16>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int32) {
+            res = _execute_number<Int32>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int64) {
+            res = _execute_number<Int64>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int128) {
+            res = _execute_number<Int128>(src_column, src_offsets, dest_column, dest_offsets,
+                                          src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Float32) {
+            res = _execute_number<Float32>(src_column, src_offsets, dest_column, dest_offsets,
+                                           src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Float64) {
+            res = _execute_number<Float64>(src_column, src_offsets, dest_column, dest_offsets,
+                                           src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Date) {
+            res = _execute_number<Date>(src_column, src_offsets, dest_column, dest_offsets,
+                                        src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::DateTime) {

Review Comment:
   > need add decimal support
   
   yes, I am adding function for the decimal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905780909


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();

Review Comment:
   > 
   
   yes, it also can use dest_nested_column.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905779669


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(

Review Comment:
   > 
   
   ok,here not need add make_nullable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r911724771


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,296 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();

Review Comment:
   ```suggestion
                   << " and arguments[0] is " << arguments[0]->get_name();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt closed pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt closed pull request #10388: [feature-wip] (array-type) add the array_distinct function
URL: https://github.com/apache/doris/pull/10388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
eldenmoon commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905977203


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   But the sql may fail if not add symbol, you could try this, here's my issue i encounterd
   ```
   mysql> set enable_vectorized_engine = true;
   Query OK, 0 rows affected (0.05 sec)
   mysql> select size(id) from github where element_at(id, 1) = 1 ;
   ERROR 1105 (HY000): errCode = 2, detailMessage = Function element_at is not implemented.
   ```
   there are no errors if no `where` condition



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905773204


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nullable_column =
+                check_and_get_column<ColumnNullable>(src_column_data);
+        if (src_nullable_column) {
+            src_inner_column = src_nullable_column->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_nested_column = nullptr;
+        ColumnNullable* dest_nullable_column = reinterpret_cast<ColumnNullable*>(&dest_column);
+        if (dest_nullable_column) {
+            dest_nested_column = dest_nullable_column->get_nested_column_ptr();
+        } else {
+            dest_nested_column = &dest_column;
+        }
+
+        auto res_val =
+                _execute_by_type(*src_inner_column, src_offsets, *dest_nested_column, dest_offsets,
+                                 src_nullable_column, dest_nullable_column, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename T>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnVector<T>* src_data_concrete =
+                check_and_get_column<ColumnVector<T>>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<T>& src_datas = src_data_concrete->get_data();
+
+        ColumnVector<T>& dest_data_concrete = reinterpret_cast<ColumnVector<T>&>(dest_column);
+        PaddedPODArray<T>& dest_datas = dest_data_concrete.get_data();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        // using Set = HashSetWithSavedHashWithStackMemory<T, DefaultHash<T>, INITIAL_SIZE_DEGREE>;
+        using Set = HashSetWithSavedHashWithStackMemory<UInt128, UInt128TrivialHash,
+                                                        INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if (src_nullable_col && (*src_null_map)[j]) {
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(true);
+                        // Note: here we need to add an element which will not use for output
+                        // because we expand the value of each offset
+                        dest_datas.push_back(-1);
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                UInt128 hash;
+                SipHash hash_function;
+                src_column.update_hash_with_value(j, hash_function);
+                hash_function.get128(reinterpret_cast<char*>(&hash));
+
+                if (!set.find(hash)) {
+                    set.insert(hash);
+                    dest_datas.push_back(src_datas[j]);
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+
+        return true;
+    }
+
+    bool _execute_string(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnString* src_data_concrete = check_and_get_column<ColumnString>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+
+        ColumnString& dest_column_string = reinterpret_cast<ColumnString&>(dest_column);
+        ColumnString::Offsets& column_string_offsets = dest_column_string.get_offsets();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        using Set =
+                HashSetWithSavedHashWithStackMemory<StringRef, StringRefHash, INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if (src_nullable_col && (*src_null_map)[j]) {
+                    if (dest_nullable_col && dest_null_map) {
+                        column_string_offsets.push_back(column_string_offsets.back());
+                        (*dest_null_map).push_back(true);
+                        null_size++;
+                    }
+                    continue;
+                }
+                StringRef src_str_ref = src_data_concrete->get_data_at(j);
+
+                if (!set.find(src_str_ref)) {
+                    set.insert(src_str_ref);
+                    dest_column_string.insert_data(src_str_ref.data, src_str_ref.size);
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+        return true;
+    }
+
+    bool _execute_by_type(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                          IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                          const ColumnNullable* src_nullable_col, ColumnNullable* dest_nullable_col,
+                          DataTypePtr& nested_type) {
+        bool res = false;
+        WhichDataType which(remove_nullable(nested_type)->get_type_id());
+        if (which.idx == TypeIndex::UInt8) {
+            res = _execute_number<UInt8>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt16) {
+            res = _execute_number<UInt16>(src_column, src_offsets, dest_column, dest_offsets,
+                                          src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt32) {
+            res = _execute_number<UInt32>(src_column, src_offsets, dest_column, dest_offsets,
+                                          src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt64) {
+            res = _execute_number<UInt64>(src_column, src_offsets, dest_column, dest_offsets,
+                                          src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt128) {
+            res = _execute_number<UInt128>(src_column, src_offsets, dest_column, dest_offsets,
+                                           src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int8) {
+            res = _execute_number<Int8>(src_column, src_offsets, dest_column, dest_offsets,
+                                        src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int16) {
+            res = _execute_number<Int16>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int32) {
+            res = _execute_number<Int32>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int64) {
+            res = _execute_number<Int64>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Int128) {
+            res = _execute_number<Int128>(src_column, src_offsets, dest_column, dest_offsets,
+                                          src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Float32) {
+            res = _execute_number<Float32>(src_column, src_offsets, dest_column, dest_offsets,
+                                           src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Float64) {
+            res = _execute_number<Float64>(src_column, src_offsets, dest_column, dest_offsets,
+                                           src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::Date) {
+            res = _execute_number<Date>(src_column, src_offsets, dest_column, dest_offsets,
+                                        src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::DateTime) {

Review Comment:
   need add decimal support



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905769638


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),

Review Comment:
   clone from src column is enough?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r908081667


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   @eldenmoon hello, i had fix the bug in this pr: https://github.com/apache/doris/pull/10467



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r911755129


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,295 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();
+        return check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type();
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;

Review Comment:
   I think that it use IColumn directly can reduce type conversions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905772945


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nullable_column =
+                check_and_get_column<ColumnNullable>(src_column_data);
+        if (src_nullable_column) {
+            src_inner_column = src_nullable_column->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_nested_column = nullptr;
+        ColumnNullable* dest_nullable_column = reinterpret_cast<ColumnNullable*>(&dest_column);
+        if (dest_nullable_column) {
+            dest_nested_column = dest_nullable_column->get_nested_column_ptr();
+        } else {
+            dest_nested_column = &dest_column;
+        }
+
+        auto res_val =
+                _execute_by_type(*src_inner_column, src_offsets, *dest_nested_column, dest_offsets,
+                                 src_nullable_column, dest_nullable_column, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename T>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnVector<T>* src_data_concrete =
+                check_and_get_column<ColumnVector<T>>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<T>& src_datas = src_data_concrete->get_data();
+
+        ColumnVector<T>& dest_data_concrete = reinterpret_cast<ColumnVector<T>&>(dest_column);
+        PaddedPODArray<T>& dest_datas = dest_data_concrete.get_data();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        // using Set = HashSetWithSavedHashWithStackMemory<T, DefaultHash<T>, INITIAL_SIZE_DEGREE>;
+        using Set = HashSetWithSavedHashWithStackMemory<UInt128, UInt128TrivialHash,
+                                                        INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if (src_nullable_col && (*src_null_map)[j]) {
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(true);
+                        // Note: here we need to add an element which will not use for output
+                        // because we expand the value of each offset
+                        dest_datas.push_back(-1);
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                UInt128 hash;
+                SipHash hash_function;
+                src_column.update_hash_with_value(j, hash_function);
+                hash_function.get128(reinterpret_cast<char*>(&hash));
+
+                if (!set.find(hash)) {
+                    set.insert(hash);
+                    dest_datas.push_back(src_datas[j]);
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+
+        return true;
+    }
+
+    bool _execute_string(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnString* src_data_concrete = check_and_get_column<ColumnString>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+
+        ColumnString& dest_column_string = reinterpret_cast<ColumnString&>(dest_column);
+        ColumnString::Offsets& column_string_offsets = dest_column_string.get_offsets();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();
+        }
+
+        PaddedPODArray<UInt8>* dest_null_map = nullptr;
+        if (dest_nullable_col) {
+            dest_null_map = &dest_nullable_col->get_null_map_column().get_data();
+        }
+
+        using Set =
+                HashSetWithSavedHashWithStackMemory<StringRef, StringRefHash, INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if (src_nullable_col && (*src_null_map)[j]) {
+                    if (dest_nullable_col && dest_null_map) {
+                        column_string_offsets.push_back(column_string_offsets.back());
+                        (*dest_null_map).push_back(true);
+                        null_size++;
+                    }
+                    continue;
+                }
+                StringRef src_str_ref = src_data_concrete->get_data_at(j);
+
+                if (!set.find(src_str_ref)) {
+                    set.insert(src_str_ref);
+                    dest_column_string.insert_data(src_str_ref.data, src_str_ref.size);
+                    if (dest_nullable_col && dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+        return true;
+    }
+
+    bool _execute_by_type(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                          IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                          const ColumnNullable* src_nullable_col, ColumnNullable* dest_nullable_col,
+                          DataTypePtr& nested_type) {
+        bool res = false;
+        WhichDataType which(remove_nullable(nested_type)->get_type_id());
+        if (which.idx == TypeIndex::UInt8) {
+            res = _execute_number<UInt8>(src_column, src_offsets, dest_column, dest_offsets,
+                                         src_nullable_col, dest_nullable_col);
+        } else if (which.idx == TypeIndex::UInt16) {

Review Comment:
   we do not support Unsigned data type, except UInt8 for Boolean.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905771545


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nullable_column =
+                check_and_get_column<ColumnNullable>(src_column_data);

Review Comment:
   normally we define variables like:
   xxx_data ===> pod_array
   xxx_column or xxx_col  ===> column
   
   no need to define src_column_data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
eldenmoon commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905977203


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   But the sql may fail if not add symbol, you could try this, here's my issue i encounterd
   ```
   mysql> set enable_vectorized_engine = true;
   Query OK, 0 rows affected (0.05 sec)
   mysql> select id from github where element_at(id, 1) = 1 ;
   ERROR 1105 (HY000): errCode = 2, detailMessage = Function element_at is not implemented.
   ```
   there are no errors if no `where` condition



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r906183972


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   set enable_array_type = true;



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r910645790


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,295 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();
+        return check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type();
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();

Review Comment:
   yes, it is a good suggestion. I will add this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r912330918


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,265 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);

Review Comment:
   we use default nullable process logic, input type will not be nullable, no need to remove_nullable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r913573408


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,268 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = block.get_by_position(arguments[0]).type;
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const NullMapType* src_null_map = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+            src_null_map = &src_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        NullMapType* dest_null_map = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+            dest_null_map = &dest_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val = _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column,
+                                        dest_offsets, src_null_map, dest_null_map, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        using NestType = typename ColumnType::value_type;
+        using ElementNativeType = typename NativeType<NestType>::Type;
+
+        const ColumnType* src_data_concrete = check_and_get_column<ColumnType>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<NestType>& src_datas = src_data_concrete->get_data();
+
+        ColumnType& dest_data_concrete = reinterpret_cast<ColumnType&>(dest_column);
+        PaddedPODArray<NestType>& dest_datas = dest_data_concrete.get_data();
+
+        using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>,
+                                           INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(true);
+                        // Note: here we need to add an element which will not use for output
+                        // because we expand the value of each offset
+                        dest_datas.push_back(NestType());
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                if (!set.find(src_datas[j])) {
+                    set.insert(src_datas[j]);
+                    dest_datas.push_back(src_datas[j]);
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+
+        return true;
+    }
+
+    bool _execute_string(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        const ColumnString* src_data_concrete = check_and_get_column<ColumnString>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+
+        ColumnString& dest_column_string = reinterpret_cast<ColumnString&>(dest_column);
+        ColumnString::Chars& column_string_chars = dest_column_string.get_chars();
+        ColumnString::Offsets& column_string_offsets = dest_column_string.get_offsets();
+        column_string_chars.reserve(src_column.size());
+        column_string_offsets.reserve(src_offsets.size());
+
+        using Set = HashSetWithStackMemory<StringRef, DefaultHash<StringRef>, INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {
+                    if (dest_null_map) {
+                        // Note: here we need to update the offset of ColumnString
+                        column_string_offsets.push_back(column_string_offsets.back());
+                        (*dest_null_map).push_back(true);
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                StringRef src_str_ref = src_data_concrete->get_data_at(j);
+                if (!set.find(src_str_ref)) {
+                    set.insert(src_str_ref);
+                    dest_column_string.insert_data(src_str_ref.data, src_str_ref.size);
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+        return true;
+    }
+
+    bool _execute_by_type(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                          IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                          const NullMapType* src_null_map, NullMapType* dest_null_map,
+                          DataTypePtr& nested_type) {
+        bool res = false;
+        WhichDataType which(remove_nullable(nested_type)->get_type_id());
+        if (which.idx == TypeIndex::UInt8) {

Review Comment:
   ```suggestion
           if (which.is_uint8()) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on PR #10388:
URL: https://github.com/apache/doris/pull/10388#issuecomment-1175860730

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10388:
URL: https://github.com/apache/doris/pull/10388#issuecomment-1178425897

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
eldenmoon commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r909160649


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   @HappenLee thanks a lot, i will test it immediately after it's merged



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xy720 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
xy720 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r910607441


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,295 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();
+        return check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type();
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;

Review Comment:
   Same above.



##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,295 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();
+        return check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type();
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;

Review Comment:
   Use ColumnPtr instead is better.



##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,295 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();
+        return check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type();
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();

Review Comment:
   The size of dest_offsets should be same as input_rows_count, 
   so I think you can reserve it like `dest_offsets.reserve(input_rows_count)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905775087


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nullable_column =
+                check_and_get_column<ColumnNullable>(src_column_data);
+        if (src_nullable_column) {
+            src_inner_column = src_nullable_column->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_nested_column = nullptr;
+        ColumnNullable* dest_nullable_column = reinterpret_cast<ColumnNullable*>(&dest_column);
+        if (dest_nullable_column) {
+            dest_nested_column = dest_nullable_column->get_nested_column_ptr();
+        } else {
+            dest_nested_column = &dest_column;
+        }
+
+        auto res_val =
+                _execute_by_type(*src_inner_column, src_offsets, *dest_nested_column, dest_offsets,
+                                 src_nullable_column, dest_nullable_column, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename T>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnVector<T>* src_data_concrete =
+                check_and_get_column<ColumnVector<T>>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<T>& src_datas = src_data_concrete->get_data();
+
+        ColumnVector<T>& dest_data_concrete = reinterpret_cast<ColumnVector<T>&>(dest_column);
+        PaddedPODArray<T>& dest_datas = dest_data_concrete.get_data();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();

Review Comment:
   src_nullable_col->get_null_map_column().get_data().data() returns UInt8*.
   
   use `const UInt8* src_null_map` is simpler.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905769978


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();

Review Comment:
   dest_nested_column?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905781717


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nullable_column =
+                check_and_get_column<ColumnNullable>(src_column_data);
+        if (src_nullable_column) {
+            src_inner_column = src_nullable_column->get_nested_column_ptr();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_nested_column = nullptr;
+        ColumnNullable* dest_nullable_column = reinterpret_cast<ColumnNullable*>(&dest_column);
+        if (dest_nullable_column) {
+            dest_nested_column = dest_nullable_column->get_nested_column_ptr();
+        } else {
+            dest_nested_column = &dest_column;
+        }
+
+        auto res_val =
+                _execute_by_type(*src_inner_column, src_offsets, *dest_nested_column, dest_offsets,
+                                 src_nullable_column, dest_nullable_column, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename T>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const ColumnNullable* src_nullable_col,
+                         ColumnNullable* dest_nullable_col) {
+        const ColumnVector<T>* src_data_concrete =
+                check_and_get_column<ColumnVector<T>>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<T>& src_datas = src_data_concrete->get_data();
+
+        ColumnVector<T>& dest_data_concrete = reinterpret_cast<ColumnVector<T>&>(dest_column);
+        PaddedPODArray<T>& dest_datas = dest_data_concrete.get_data();
+
+        const PaddedPODArray<UInt8>* src_null_map = nullptr;
+        if (src_nullable_col) {
+            src_null_map = &src_nullable_col->get_null_map_column().get_data();

Review Comment:
   > src_nullable_col->get_null_map_column().get_data().data() returns UInt8*.
   > 
   > use `const UInt8* src_null_map` is simpler.
   
   this is a good suggestion,  I will simplify here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r906179215


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   Got it, I will find the problem and try to fix it. 
   Some weeks before, where condition with array function works well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r912441541


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,265 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);

Review Comment:
   ok.I will re-check it. Maybe add 'remove_nullable' could be protective?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r913573116


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,268 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = block.get_by_position(arguments[0]).type;
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const NullMapType* src_null_map = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+            src_null_map = &src_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        NullMapType* dest_null_map = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+            dest_null_map = &dest_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val = _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column,
+                                        dest_offsets, src_null_map, dest_null_map, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        using NestType = typename ColumnType::value_type;
+        using ElementNativeType = typename NativeType<NestType>::Type;
+
+        const ColumnType* src_data_concrete = check_and_get_column<ColumnType>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<NestType>& src_datas = src_data_concrete->get_data();
+
+        ColumnType& dest_data_concrete = reinterpret_cast<ColumnType&>(dest_column);
+        PaddedPODArray<NestType>& dest_datas = dest_data_concrete.get_data();
+
+        using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>,
+                                           INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(true);
+                        // Note: here we need to add an element which will not use for output
+                        // because we expand the value of each offset
+                        dest_datas.push_back(NestType());
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                if (!set.find(src_datas[j])) {
+                    set.insert(src_datas[j]);
+                    dest_datas.push_back(src_datas[j]);
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+
+        return true;
+    }
+
+    bool _execute_string(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        const ColumnString* src_data_concrete = check_and_get_column<ColumnString>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+
+        ColumnString& dest_column_string = reinterpret_cast<ColumnString&>(dest_column);
+        ColumnString::Chars& column_string_chars = dest_column_string.get_chars();
+        ColumnString::Offsets& column_string_offsets = dest_column_string.get_offsets();
+        column_string_chars.reserve(src_column.size());
+        column_string_offsets.reserve(src_offsets.size());
+
+        using Set = HashSetWithStackMemory<StringRef, DefaultHash<StringRef>, INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {
+                    if (dest_null_map) {
+                        // Note: here we need to update the offset of ColumnString
+                        column_string_offsets.push_back(column_string_offsets.back());
+                        (*dest_null_map).push_back(true);
+                        null_size++;
+                    }
+                    continue;
+                }
+
+                StringRef src_str_ref = src_data_concrete->get_data_at(j);
+                if (!set.find(src_str_ref)) {
+                    set.insert(src_str_ref);
+                    dest_column_string.insert_data(src_str_ref.data, src_str_ref.size);
+                    if (dest_null_map) {
+                        (*dest_null_map).push_back(false);
+                    }
+                }
+            }
+
+            res_offset += set.size() + null_size;
+            dest_offsets.push_back(res_offset);
+            prev_src_offset = curr_src_offset;
+        }
+        return true;
+    }
+
+    bool _execute_by_type(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                          IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                          const NullMapType* src_null_map, NullMapType* dest_null_map,
+                          DataTypePtr& nested_type) {
+        bool res = false;
+        WhichDataType which(remove_nullable(nested_type)->get_type_id());

Review Comment:
   ```suggestion
           WhichDataType which(remove_nullable(nested_type));
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r911753157


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,296 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();
+        return check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type();

Review Comment:
   this is a problem, so return arguments[0] is more better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] carlvinhust2012 commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r911755129


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,295 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();
+        return check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type();
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;

Review Comment:
   I think that it use IColumn* directly can reduce type conversions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905771885


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,294 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray";
+        return make_nullable(
+                check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type());
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_column_data = src_column_array->get_data();
+
+        DataTypePtr src_column_type = remove_nullable(block.get_by_position(arguments[0]).type);
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+
+        const IColumn* src_inner_column = nullptr;
+        const ColumnNullable* src_nullable_column =

Review Comment:
   src_nested_nullable_col



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] eldenmoon commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
eldenmoon commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r905977203


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   But the sql may fail if not add symbol, you could try this, here's my issue i encounterd
   id is ARRAY<INT>
   ```
   mysql> set enable_vectorized_engine = true;
   Query OK, 0 rows affected (0.05 sec)
   mysql> select size(id) from github where element_at(id, 1) = 1 ;
   ERROR 1105 (HY000): errCode = 2, detailMessage = Function element_at is not implemented.
   ```
   there are no errors if no `where` condition



##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   But the sql may fail if not add symbol, you could try this, here's my issue i encounterd
   id is `ARRAY<INT>`
   ```
   mysql> set enable_vectorized_engine = true;
   Query OK, 0 rows affected (0.05 sec)
   mysql> select size(id) from github where element_at(id, 1) = 1 ;
   ERROR 1105 (HY000): errCode = 2, detailMessage = Function element_at is not implemented.
   ```
   there are no errors if no `where` condition



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r906179215


##########
gensrc/script/doris_builtins_functions.py:
##########
@@ -160,6 +160,18 @@
     [['array_position'], 'BIGINT', ['ARRAY_STRING', 'STRING'], '', '', '', 'vec', ''],
 
     [['cardinality', 'size'], 'BIGINT', ['ARRAY'], '', '', '', 'vec', ''],
+    [['array_distinct'], 'ARRAY_TINYINT',   ['ARRAY_TINYINT'], '', '', '', 'vec', ''],

Review Comment:
   Got it, I will find the problem and try to fix it. 
   Some weeks before, where condition with array function works well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r911726587


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,296 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0].get()->get_name();
+        return check_and_get_data_type<DataTypeArray>(arguments[0].get())->get_nested_type();

Review Comment:
   why not return type is not arguments[0]?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r913579682


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,268 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = block.get_by_position(arguments[0]).type;
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const NullMapType* src_null_map = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+            src_null_map = &src_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        NullMapType* dest_null_map = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+            dest_null_map = &dest_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val = _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column,
+                                        dest_offsets, src_null_map, dest_null_map, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        using NestType = typename ColumnType::value_type;
+        using ElementNativeType = typename NativeType<NestType>::Type;
+
+        const ColumnType* src_data_concrete = check_and_get_column<ColumnType>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<NestType>& src_datas = src_data_concrete->get_data();
+
+        ColumnType& dest_data_concrete = reinterpret_cast<ColumnType&>(dest_column);
+        PaddedPODArray<NestType>& dest_datas = dest_data_concrete.get_data();
+
+        using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>,
+                                           INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {
+                    if (dest_null_map) {

Review Comment:
   assert or DCHECK dest_null_map



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] cambyzju commented on a diff in pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #10388:
URL: https://github.com/apache/doris/pull/10388#discussion_r913577954


##########
be/src/vec/functions/array/function_array_distinct.h:
##########
@@ -0,0 +1,268 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+// This file is copied from
+// https://github.com/ClickHouse/ClickHouse/blob/master/src/Functions/array/arrayDistinct.cpp
+// and modified by Doris
+#pragma once
+
+#include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+#include "vec/common/hash_table/hash_set.h"
+#include "vec/common/hash_table/hash_table.h"
+#include "vec/common/sip_hash.h"
+#include "vec/data_types/data_type_array.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/functions/function.h"
+#include "vec/functions/function_helpers.h"
+#include "vec/io/io_helper.h"
+
+namespace doris::vectorized {
+
+class FunctionArrayDistinct : public IFunction {
+public:
+    static constexpr auto name = "array_distinct";
+    static FunctionPtr create() { return std::make_shared<FunctionArrayDistinct>(); }
+    using NullMapType = PaddedPODArray<UInt8>;
+
+    /// Get function name.
+    String get_name() const override { return name; }
+
+    bool is_variadic() const override { return false; }
+
+    size_t get_number_of_arguments() const override { return 1; }
+
+    DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
+        DCHECK(is_array(arguments[0]))
+                << "first argument for function: " << name << " should be DataTypeArray"
+                << " and arguments[0] is " << arguments[0]->get_name();
+        return arguments[0];
+    }
+
+    Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
+                        size_t result, size_t input_rows_count) override {
+        ColumnPtr src_column =
+                block.get_by_position(arguments[0]).column->convert_to_full_column_if_const();
+        const auto& src_column_array = check_and_get_column<ColumnArray>(*src_column);
+        if (!src_column_array) {
+            return Status::RuntimeError(
+                    fmt::format("unsupported types for function {}({})", get_name(),
+                                block.get_by_position(arguments[0]).type->get_name()));
+        }
+        const auto& src_offsets = src_column_array->get_offsets();
+        const auto& src_nested_column = src_column_array->get_data();
+
+        DataTypePtr src_column_type = block.get_by_position(arguments[0]).type;
+        auto nested_type = assert_cast<const DataTypeArray&>(*src_column_type).get_nested_type();
+        auto dest_column_ptr = ColumnArray::create(nested_type->create_column(),
+                                                   ColumnArray::ColumnOffsets::create());
+        IColumn& dest_nested_column = dest_column_ptr->get_data();
+        ColumnArray::Offsets& dest_offsets = dest_column_ptr->get_offsets();
+        dest_nested_column.reserve(src_nested_column.size());
+        dest_offsets.reserve(input_rows_count);
+
+        const IColumn* src_inner_column = nullptr;
+        const NullMapType* src_null_map = nullptr;
+        const ColumnNullable* src_nested_nullable_col = nullptr;
+        if (src_nested_column.is_nullable()) {
+            src_nested_nullable_col = check_and_get_column<ColumnNullable>(src_nested_column);
+            src_inner_column = src_nested_nullable_col->get_nested_column_ptr();
+            src_null_map = &src_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            src_inner_column = src_column_array->get_data_ptr();
+        }
+
+        IColumn* dest_inner_column = nullptr;
+        NullMapType* dest_null_map = nullptr;
+        ColumnNullable* dest_nested_nullable_col = nullptr;
+        if (dest_nested_column.is_nullable()) {
+            dest_nested_nullable_col = reinterpret_cast<ColumnNullable*>(&dest_nested_column);
+            dest_inner_column = &dest_nested_nullable_col->get_nested_column();
+            dest_null_map = &dest_nested_nullable_col->get_null_map_column().get_data();
+        } else {
+            dest_inner_column = &dest_nested_column;
+        }
+
+        auto res_val = _execute_by_type(*src_inner_column, src_offsets, *dest_inner_column,
+                                        dest_offsets, src_null_map, dest_null_map, nested_type);
+        if (!res_val) {
+            return Status::RuntimeError(
+                    fmt::format("execute failed or unsupported types for function {}({})",
+                                get_name(), block.get_by_position(arguments[0]).type->get_name()));
+        }
+
+        block.replace_by_position(result, std::move(dest_column_ptr));
+        return Status::OK();
+    }
+
+private:
+    // Note: Here initially allocate a piece of memory for 2^5 = 32 elements.
+    static constexpr size_t INITIAL_SIZE_DEGREE = 5;
+
+    template <typename ColumnType>
+    bool _execute_number(const IColumn& src_column, const ColumnArray::Offsets& src_offsets,
+                         IColumn& dest_column, ColumnArray::Offsets& dest_offsets,
+                         const NullMapType* src_null_map, NullMapType* dest_null_map) {
+        using NestType = typename ColumnType::value_type;
+        using ElementNativeType = typename NativeType<NestType>::Type;
+
+        const ColumnType* src_data_concrete = check_and_get_column<ColumnType>(&src_column);
+        if (!src_data_concrete) {
+            return false;
+        }
+        const PaddedPODArray<NestType>& src_datas = src_data_concrete->get_data();
+
+        ColumnType& dest_data_concrete = reinterpret_cast<ColumnType&>(dest_column);
+        PaddedPODArray<NestType>& dest_datas = dest_data_concrete.get_data();
+
+        using Set = HashSetWithStackMemory<ElementNativeType, DefaultHash<ElementNativeType>,
+                                           INITIAL_SIZE_DEGREE>;
+        Set set;
+
+        ColumnArray::Offset prev_src_offset = 0;
+        ColumnArray::Offset res_offset = 0;
+
+        for (auto curr_src_offset : src_offsets) {
+            set.clear();
+            size_t null_size = 0;
+            for (ColumnArray::Offset j = prev_src_offset; j < curr_src_offset; ++j) {
+                if ((*src_null_map)[j]) {

Review Comment:
   check src_null_map is nullable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10388: [feature-wip] (array-type) add the array_distinct function

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10388:
URL: https://github.com/apache/doris/pull/10388#issuecomment-1178425947

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org